BBB maintained a single-level catalog of bikes manufactured over a few decades. As the market changed and new bicycle subtypes appeared, product categories got too broad, each containing thousands of models. Eventually, it became difficult to find a particular bike. So the company decided on modernizing the existing taxonomy. BBB developed a two-level catalog hierarchy and defined attributes important for categorization. We took over the data science side of the project: extracted attributes from the product descriptions using natural language processing (NLP) techniques and cleaned and normalized data. Also, we automated a migration process from the old catalog to the new one.
The company supplements its inventory with items from various 3rd-party sources — including catalogs by brands. To save time and effort, BBB wanted new products to be added automatically. Since names, categories, and properties in different systems don’t coincide, we wrote, tested, and adjusted mapping rules between external and internal catalogs. Finally, we built an NLP-based pipeline that finds meaningful words (brand, model name, etc.) in the bicycle title and other text elements, maps them to unify the content, imports new items to the BBB catalog, and places them under corresponding categories and subcategories.
A core feature of BBB is its online value guide which recommends a price range for second-hand bikes. For more precision, BBB planned to apply machine learning. We chose a LightGBM algorithm known for its high accuracy and nonlinearity. To train it, our team pre-processed data from two sources — BBB and eBay sales. The algorithm takes as input various features including the manufacturer's suggested retail price (MSRP), brand, year of production, type (category), condition, and others. It predicts price depreciation (how much value a bike is likely to lose over a certain period of time) and calculates a price range from the forecast.