Prepare a relevant subset of quote estimates,
Prepare a relevant subset of quote estimates,
Select the best algorithm to forecast average quotes,
Integrate the model with the widget, and
Keep the model updated on a regular basis.
The initial dataset provided by the client included over one million entries by users, collected via the main platform for several years. However, it contained a lot of excessive and irrelevant details we had to get rid of. Besides that, after doing a lot of experimentation, we found out that several years’ data leads to less accurate results than most recent information. That’s because insurance prices are volatile and change over time. So, the final dataset included fresh observations collected over the previous year. At the end of the day, only about 20 percent of initial data were selected to train a model.
Random Forest and LightGBM are two ensemble machine learning algorithms our data scientists considered for the average quote prediction. Eventually, LightGBM won owing to its capability of processing large amounts of data in a relatively short period of time. The model was built and trained in Python, using Pandas and scikit-learn libraries.
One of the final steps was to integrate the existing algorithm with the client’s widget. For this purpose, our team developed a web service, based on the Flask microframework, and deployed it using Docker. It allows the widget to feed new data to the prediction model and retrieve forecasts to be displayed to users.
The web service also includes a Python script that enables the model to learn from fresh data and automatically update once a month. This continuous adaptation to change ensures relevant quote predictions over a long period of time. True-to-life results elevate the chances of making users visit the main platform, complete the full form, and, eventually, strike a deal with the insurance company of their choice. This, in turn, helps the main platform attract new insurance partners and expand its pool of options for car owners.