Selecting a representative subset of data from over one million entries
The initial dataset provided by the client included over one million entries by users, collected via the main platform for several years. However, it contained a lot of excessive and irrelevant details we had to get rid of. Besides that, after doing a lot of experimentation, we found out that several years’ data leads to less accurate results than most recent information. That’s because insurance prices are volatile and change over time. So, the final dataset included fresh observations collected over the previous year. At the end of the day, only about 20 percent of initial data were selected to train a model.
Choosing the fastest algorithm to predict quotes
Random Forest and LightGBM are two ensemble machine learning algorithms our data scientists considered for the average quote prediction. Eventually, LightGBM won owing to its capability of processing large amounts of data in a relatively short period of time. The model was built and trained in Python, using Pandas and scikit-learn libraries.
Creating a web service to integrate the algorithm
One of the final steps was to integrate the existing algorithm with the client’s widget. For this purpose, our team developed a web service, based on the Flask microframework, and deployed it using Docker. It allows the widget to feed new data to the prediction model and retrieve forecasts to be displayed to users.
Running automated model updates once a month
The web service also includes a Python script that enables the model to learn from fresh data and automatically update once a month. This continuous adaptation to change ensures relevant quote predictions over a long period of time. True-to-life results elevate the chances of making users visit the main platform, complete the full form, and, eventually, strike a deal with the insurance company of their choice. This, in turn, helps the main platform attract new insurance partners and expand its pool of options for car owners.
Approach and Technical Info
The project’s scope was 2 man-months. It was completed by one Data Scientist.
The technology stack of the project included Python, Pandas and scikit-learn libraries, Flask microservices, and Docker (for API building and integration). The machine learning approach included the LightGBM algorithm.
Cooperation between the client and AltexSoft is ongoing.
Wondering how to build
an exclusive software product for your business?
Discuss your project needs with our architects.