Traffic Prediction with Machine Learning: How to Forecast Co

In 2021, NYC drivers lost an average of 102 hours in congestion – and before the pandemic that score was even worse. How often do you yourself get stuck in the jam wishing you’d known about it in advance and took a different route? And how often do you have to apologize to your customers for your drivers being late because of traffic?

Navigating tools like Google Maps or Waze show you the time needed for your trip, calculate your ETA, and create the most optimal route based on road conditions and predicted traffic. Multiple logistics-related businesses heavily rely on the accuracy of these calculations. But have you ever thought about how Google Maps knows what to expect on the way?

In this post, we explore what’s going on behind the scenes of traffic prediction, which data is used, which technologies and algorithms are implemented, and how to get that desired forecast to your screen. But first, let’s start with explaining why it’s important at all.

What is traffic prediction, who needs it, and why is it important?

Traffic prediction means forecasting the volume and density of traffic flow, usually for the purpose of managing vehicle movement, reducing congestion, and generating the optimal (least time- or energy-consuming) route.

Traffic prediction is mainly important for two groups of organizations (we’re not talking about folks planning a weekend getaway, you know).

National/local authorities. In the last ten to twenty years, many cities adopted intelligent transportation systems (ITS) that support urban transportation network planning and traffic management. These systems use current traffic information as well as generated predictions to improve transport efficiency and safety by informing users of current road conditions and adjusting road infrastructure (e.g., street lights).

accident warning sign

~~ITS alerts. Source: WTI~~

2. Logistics companies. Another area of implementation is the logistics industry. Transportation, delivery, field service, and other businesses have to accurately schedule their operations and create the most efficient routes. Often, it’s not only related to current trips, but also to activities in the future. Precise forecasts of road and traffic conditions to avoid congestion are crucial for such companies’ planning and performance.

So, how is traffic predicted?

As of today, different machine learning (and specifically deep learning) techniques capable of processing huge amounts of both historic and real-time data are used to forecast traffic flow, density, and speed. We’ll describe some effective algorithms further on. But first, we’ll look into what data is needed for traffic prediction and where you can get it from.

Data types and sources

Traffic is influenced by many factors, and you should consider all of them to make accurate predictions. So, there are several main groups of data that you’ll have to obtain. data used for traffic prediction

~~Data needed for traffic prediction~~

Mapping data. First of all, you need to have a detailed map with road networks and related attributes. Connecting to such global mapping data providers as Google Maps, TomTom, HERE, or OSM is a great way to obtain complete and up-to-date information.

Traffic information. Then, you’ll have to collect both historical and current traffic-related information such as the number of vehicles passing at a certain point, their speed, and type (trucks, light vehicles, etc.). Devices used to collect this data are

loop detectors,
cameras,
weigh in motion sensors, and
radars, or other sensor technologies.

Fortunately, you don’t have to install these devices all over the place on your own. It’s easier to get this information from the aforementioned providers that gather data from a system of sensors, diverse third-party sources, or make use of GPS probe data. (Just in case you’re unfamiliar with how infrastructures for collecting, processing, and storing data are designed and work, you can visit our related post on data engineering to get an idea.)

Other platforms such as Otonomo use an innovative Vehicle to Everything (V2X) technology to collect so-called connected car data from embedded modems.

You can also get other important information on incidents (road closures or roadworks), places of interest, etc., from data providers.

Weather information. Weather data (historical, current, and forecasted) is also necessary as meteorological conditions impact the road situation and driving speed. There are lots of weather data providers you can connect to — such as OpenWeather or Tomorrow.io.

Additional data on road conditions. There are external data sources that can provide important information that impacts traffic. Think social media posts about sports events in the area, local news about civil protests, or even police scanners about crime scenes, accidents, or road blockages.

~~Extraction of social media data for traffic prediction. Source: Computing Urban Traffic Congestions by Incorporating Sparse GPS Probe Data and Social Media Data~~

We’re not going to focus on the storage and preparation aspects of data management (even though they’re also important and will demand an expert touch), but we'll get straight to the fun part.

Algorithms for generating traffic predictions

Traffic prediction involves forecasting drivable speed on particular road segments, as well as jam occurrence and evolution. Let’s take a look at different approaches to this task.

approaches to traffic prediction

~~How traffic prediction works~~

Statistical approach

Statistical methods allow you to identify traffic patterns at a different scale: during the day, on different days of the week, seasonal, etc. They are usually easier, faster, and cheaper to implement than machine learning ones. However, they are less accurate since they can’t process as much multivariate data.

Specifically, auto-regressive integrated moving average (ARIMA) models have been actively used to predict traffic since the 1970s as they are easy to implement and show higher accuracy compared to other statistical methods. It’s a classical statistical approach to analyzing past events and predicting future ones. It observes data that is collected from a series of regular time intervals and assumes that past patterns will repeat in the future.

However, traffic flow is a complex structure with many variables that can’t be effectively processed with the help of the univariate ARIMA models.

Machine learning approach

Machine learning (ML) allows you to create predictive models that consider large masses of heterogeneous data from different sources. Numerous studies have been conducted on the application of ML algorithms to forecast road traffic. Here are some successful examples.

The random forest algorithm creates multiple decision trees and merges their data to obtain accurate predictions. It’s quite fast and can produce effective results given sufficient training data. When applied to the road congestion problem, this method showed an accuracy of 87.5 percent. In this case, weather conditions, time period, special conditions of the road, road quality, and holidays are used as model input variables.

The k-nearest neighbors (KNN) algorithm relies on the principle of feature similarity to predict future values. Experiments with the KNN model demonstrated over 90 percent accuracy of short-term traffic flow prediction.

Deep learning approach

Deep learning (DL) methods have proved highly effective in predicting road traffic in comparison to ML or statistical techniques, consistently showing about 90 percent forecasting accuracy and higher. Deep learning algorithms are based on neural networks.

Neural networks (NN) or artificial neural networks (ANN) consist of interconnected nodes (neurons) that are arranged in two or more layers and are designed to function similar to the human brain. There are many types of neural networks developed for different purposes. Here are some that were used in traffic analysis and prediction.

Convolutional neural networks (CNNs) are trusted leaders in image recognition and analysis. One of their natural applications to transportation problems is congestion detection, using pictures from surveillance cameras on the road. The average accuracy of classification in this case reaches 89.5 percent. As for traffic predictions, CNNs are not the first choice. However, there were quite successful attempts to build CNN-based models forecasting transportation network speed. To make this happen, researchers converted time and space data describing traffic flow into a 2-dimensional image matrix.

Recurrent neural networks (RNNs), by contrast with CNNs, are intended to process time-series data or observations collected over certain time intervals. Traffic patterns are a good example of such observations. Research showed high accuracy in predicting congestion evolution when applying RNN models. However, their drawback is the vanishing gradient issue which means that part of the data from previous layers gets lost (that’s why RNNs are said to “have a short-term memory”). This “forgetfulness” makes model training more difficult and time-consuming

Long short term memory (LSTM) and gated recurrent unit (GRU) are variations of the RNN that address the vanishing gradient problem. A study that compared the performance of these models showed that the GRU model is more accurate in traffic flow predictions and is easier to train.

There’s a large number of studies that suggest building other types of NN models for traffic prediction, e.g., graph neural networks, fuzzy neural networks, Bayesian neural networks, and more, as well as using hybrid methods that combine two or more algorithms. As of today, no single best technique was found that could be applied in all the cases and create the most accurate forecasts.

How to implement traffic prediction

If you run a logistics business, most likely you don’t need traffic prediction by itself, but rather its impact on your operations. As we’ve already mentioned, accurate prediction is important for routing and scheduling purposes. If this is the case, there are three main ways to get those forecasts and build optimal routes (check our related article for more ideas and information).

Off-the-shelf solutions

There are lots of ready-made software solutions on the market developed for any type of business. If your company is small or medium-sized and your operations (be it field service, last-mile delivery, taxi, moving, or long-haul transportation) are more or less standardized, you can find a tool that meets your needs and has routing capabilities to support your business activities.

OptimoRoute, Fixlastmile, Badger Maps, Route4Me, or Routific – a myriad of platforms offer route planning and optimization functionality (especially in the short-term perspective). The choice depends on your industry and specific business demands.

Custom development and API integrations

If you operate a large enterprise and have unique business requirements, consider building a custom model to solve your specific needs and implement it into your platform. Be prepared that it would require significant investment, skilled data specialists, and a great deal of time to connect to diverse data providers and train those fancy ML/DL algorithms. On the bright side, you’ll get your own predictions and stay independent of software vendors.

Another option is exploiting the traffic predicting functionality of external platforms. In this case, you keep using your own system that fits your needs and that your staff is used to, and at the same time avoid the complex process of ML model building, training, evaluation, and so on. To make it happen, you’ll have to build an API integration with traffic data providers. Here are some options.

main providers of traffic data

~~Traffic prediction platforms compared~~

Google Maps Platforms. If you want to partner with the biggest mapping data provider, keep in mind: you’ll only be able to add the current traffic layer to your map, but no forecasts. Moreover, their documentation warns: “Traffic information is refreshed frequently, but not instantly. Rapid consecutive requests for the same area are unlikely to yield different results.” They also offer a diverse suite of APIs to add their map or routing functionality to your location-based product.

Waze. Another Google product and second most popular navigation app gets information on traffic, accidents, jams, and other road conditions only from their users driving around. The information is updated every 2 minutes and can be accessed via localized XML and JSON Geo RSS data feeds. Note, that this information will be reliable only if there is a sufficient number of drivers in the area. You can also embed Waze Live Map with search and routing functionality on your website. As for predictions, the app doesn’t make and provide any.

TomTom. TomTom’s Traffic RESTful APIs give access to historical and real-time data related to traffic incidents and flow. TomTom makes use of over 600 million GPS and floating car data probes to collect up-to-date information (updated every 30 seconds), analyze it, and make predictions up to 24 hours ahead. You can leverage this data to create routes with your own app or get their comprehensive routing product.

HERE. HERE Real-Time Traffic platform collects data from more than 100 incident reporting services and utilizes billions of GPS data points daily. Its data is updated every minute. HERE provides real-time information and creates accurate predictions for the next 12 hours based on historical and current traffic data. Check their Traffic RESTful API documentation for details.

ArcGIS. ArcGIS Traffic service REST APIs allow you to get traffic conditions visualized in your app. Traffic data is updated every 5 minutes and predictions for the next 4 hours are available. In addition, the web map provides information about the incidents.

PTV. PTV Traffic Data and PTV Data Analytics Platform support their own mapping and routing products (e.g., PTV Optima) and can be connected to in order to get historical and real-time traffic information as well as short-term forecasts (up to 60 minutes ahead). Visit their Traffic Information API page for integration details or Digital Map API page if you need additional map content (for example, truck-specific restrictions).

What else to consider

There are a couple more things to mention in regards to implementing ML techniques for traffic prediction.

You have to remember that ML/DL algorithms work best when there is sufficient data to train the models and fine-tune them to achieve maximum accuracy. So, the bigger datasets you manage to obtain the better results you will get.

Another important point is related to the COVID19 pandemic. Starting early 2020, traffic patterns around the world have changed significantly. For that reason, it makes sense to prioritize the most recent historical data and traffic patterns when building a predictive model.

As of today, all the analytics solutions we described above only offer predictions for the near future. And it’s understandable: Short-term forecasts are obviously more accurate than long-term ones as there’s always a chance of unforeseeable circumstances on the road. So, while the possibility to obtain more extended results is still being researched, you’ll have to find that golden mean that will suit your needs.

Maria is a curious researcher, passionate about discovering how technologies change the world. She started her career in logistics but has dedicated the last five years to exploring travel tech, large travel businesses, and product management best practices.

Want to write an article for our blog? Read our requirements and guidelines to become a contributor.

Traffic Prediction: How Machine Learning Helps Forecast Congestions and Plan Optimal Routes