Price Forecasting: Applying Machine Learning Approaches to Electricity, Flights, Hotels, Real Estate, and Stock Pricing
When you give customers advice that can help them save some money, they will pay you back with loyalty, which is priceless. Interesting fact: Fareboom users started spending twice as much time per session within a month of the release of an airfare price forecasting feature. This tool continues to grow conversion for our partner.
Besides travel, price predictions find their application in various scenarios. Commodity traders, investors, construction developers, or energy generators use estimates on future price movements for business purposes.
This time we talked with experts from AleaSoft, ENFOR, REALas, and our own data science specialist to answer the question: How to implement price forecasts on markets with high volatility? The article describes the steps to build a price prediction solution and implementation examples in four industries.
What is price forecasting and how is it done
Price forecasting is predicting a commodity/product/service price by evaluating various factors like its characteristics, demand, seasonal trends, other commodities’ prices (i.e. fuel), offers from numerous suppliers, etc.
Price forecasting may be a feature of consumer-facing travel apps, such as Trainline or Hopper, used to increase customer loyalty and engagement. At the same time, other businesses may also use information about future prices. Entrepreneurs may need to define an optimal time to buy a commodity to adjust prices of products or services that require a commodity (lumber, coffee, gold), or evaluate the investment appeal of fixed assets.
Price prediction can be formulated as a regression task. Regression analysis is a statistical technique used to estimate the relationship between a dependent/target variable (electricity price, flight fare, property price, etc.) and single or multiple independent (interdependent) variables AKA predictors that impact the target variable. Regression analysis also lets researchers determine how much these predictors influence a target variable. In regression, a target variable is always numeric.
In general, price forecasting is done by the means of descriptive and predictive analytics.
Descriptive analytics. Descriptive analytics rely on statistical methods that include data collection, analysis, interpretation, and presentation of findings. Descriptive analytics allow for transforming raw observations into knowledge one can understand and share. In short, this analytics type helps to answer the question of what happened?
Predictive analytics. Predictive analytics is about analyzing current and historical data to forecast the probability of future events, outcomes, or values in the context of price predictions. Predictive analytics requires numerous statistical techniques, such as data mining (identification of patterns in data) and machine learning.
The goal of machine learning is to build systems capable of finding patterns in data, learning from it without human intervention and explicit reprogramming. To solve the price prediction problem, data scientists first must understand what data to use to train machine learning models, and that’s exactly why descriptive analytics is needed.
To learn more about a machine learning project structure, check out our dedicated article
Then the specialists collect, select, prepare, preprocess, and transform this data. Once this stage is completed, the specialists start building predictive models. A model that forecasts prices with the highest accuracy rate will be chosen to power a system or an application. So, the framework of the price prediction task may look like this:
- Problem statement.
- Understanding of market peculiarities. Answering the question: What factors impact prices of a commodity/product/service?
- Data collection, preparation, and preprocessing.
- Modeling and testing.
- Deployment of a model into a software system or application.
Now that we know a typical price prediction project roadmap, let’s explore real-world examples from the energy sector, travel and hospitality industry, and real estate.
Electricity price forecasting: the combination of statistical and machine learning techniques
By the early 1990s, the energy sectors in many countries were fully regulated and monopolized. Government agencies and local bodies were monitoring the work of utility companies, setting their terms of service, pricing, construction plans, ensuring these companies adhered to safety and environmental standards.
Then a shift towards deregulation began, the main goal of which was to reduce electricity costs and ensure a reliable supply of energy via competition. The power industry started turning into a free market where prices for products and services depend on supply and demand. In other words, the market players trade electricity on exchanges like other commodities. The participants set their bids and offers while trying to maximize their profits. Deregulation is an ongoing process across markets.
Factors affecting electricity demand and price: weather changes, transmission, regulators, fossil fuel prices, and others
Electricity is a special commodity type, so trading it is a tricky task. It’s non-storable (must be supplied immediately once generated/must be generated and used simultaneously), so a balance between production (generation) and consumption (load) is crucial for energy system stability.
The demand for electricity and, consequently price, depends on the weather (temperature, precipitation, wind power, etc.) and changes in daily and business activities (weekends and weekdays, on-peak and off-peak hours). Non-storability of electrical energy and continuous shifts in demand lead to electricity price volatility.
Fossil fuel costs influence the electricity price as well: Fuels are burned to create steam to rotate turbines. Since the electrical power is transmitted from a generator to consumers via transmission and distribution networks, their changing maintenance costs are another influencing factor. Since not all the markets are fully deregulated and some remain under government agency control, public utility or service commissions may introduce rules that can result in changing prices.
Challenges of electricity price forecasting: bidding techniques, data sources, interconnectors, regulations, continuous changes in demand
Electricity prices fluctuate due to a multitude of factors, including purchasing and selling strategies the power industry players use.
A variety of bidding techniques that market players employ and the dependency of electricity price on many factors complicate its prediction, thinks Oriol Saltó i Bauzà, data analyst, energy forecasting specialist, and software developer of AleaSoft Energy Forecasting. “The main challenges in energy price forecasting are, on the one hand, the very large number of factors that can affect and alter the price, and on the other, human beings who place the bid and ask offers in the market. So, having very similar external conditions, market offers and final price can be very different.”
To be able to accurately forecast electricity prices, specialists must understand and take into account all the factors that may influence cost fluctuations and to gather relevant data, thinks managing director of ENFOR forecasting and optimization solutions provider for the energy industry Mikkel Westenholz. “The challenges are mainly to find the right and updated data sources describing the market and its participants, to follow regulation and interconnector development, and then to understand how these factors dynamically change your modeling.”
Electricity interconnectors are the physical cables that transfer energy between networks located in different countries facilitating power trade and balancing demand and supply. Interconnectors allow power generators to sell a surplus of energy to consumers that need to meet peak demand during specific time periods (years, seasons, months, days, or particular hours.)
The EirGrid East-West Interconnector that connects the high-voltage power grids of Ireland and Great Britain
Since interconnectors transmit electricity in both directions, they can seriously complicate price forecasting, says Mikkel: “...they can behave both as a consumer and a producer depending on the marked prices in the interconnected country (marked area) which can reverse the flow. Hence the full effect of the flow direction can only be understood by predicting the prices in both the connected marked areas.”
Regulators may introduce rules that can affect prices to a smaller or larger extent, adds the expert. “One example could be changing the rules for block bids, which could trigger larger plant with high startup costs [costs needed to turn a power plant on measured in price of a megawatt] to either be activated more or less (depending on the change).” Block bids are orders in which consumers specify amount and price for a specific number of consecutive hours within the same day. Block bids are used in European exchanges.
Price forecasting dramatically differs across the industries as it has to deal with very specific domain problems
Hybrid approach to price forecasting
Statistical methods and techniques can be combined with artificial intelligence. While statistics allow for dealing with big amounts of data, AI is efficient in capturing interconnections between data points. That’s the exact approach AleaSoft follows. “Statistics help us to manage large quantities of data, and artificial intelligence helps us to find and understand all possible relations between the variables and the prices,” says the expert.
AI for price prediction entails using traditional machine learning (ML) algorithms and deep learning models, for instance, neural networks. ML algorithms receive and analyse input data to predict output values. They improve their performance while being fed with new data. In other words, ML algorithms learn from new data without human intervention. Neural networks (NN) are human-brain-inspired computing systems and are efficient in recognising patterns.
The company specialists use their own energy price and demand forecasting model, AleaModel. “It’s a hybrid model that combines classical statistical techniques, such as time series analysis, SARIMA, and regression, and (with) artificial intelligence through a neural network and ML algorithms. The result is an artificial neural network capable of analysing time series data and being able to train itself with new data without the need of external intervention, something that is crucial in the field of energy markets where the input of new data is continuous,” explains Oriol.
Using self-learning models for electricity price forecasting
Similar to AleaSoft, ENFOR uses self-learning methods for day-ahead electricity price prediction. These methods are based on the understanding of the physical systems/structures and how they shape the market. According to Westenholz, the benefits of self-learning methods are that predictive models auto-calibrate when receiving new inputs (data) staying tuned and accurate at all times.
The company’s machine learning-powered system for electricity traders takes into account such variables as historical prices, expected production from various sources (wind, nuclear, coal, gas, solar, etc.), demand, and interconnectors to make predictions.
“The technique choice depends on the accuracy of forecasts, the amount of manual work with data required, and finally available data,” adds Mikkel Westenholz.
Regular businesses can’t handle the task of developing such software. A viable option would be to find a data science and AI consulting company that can govern the overall project, from collecting and preparing needed data to model development to deployment.
Travel and hospitality: flight and hotel price predictions for end customers
Travel and hospitality brands collect and analyze high volumes of data about people’s preferences and online behavior to personalize customer experience. Using price prediction to complement search functionality is another popular way of gaining traveler trust and… increase transactions volume. Kayak and Skyscanner, two large digital players on the travel scene, are leveraging the technique as smaller players also embark on the initiative to add value.
Challenges of flight and hotel price forecasting: undisclosed approaches to revenue management and pricing strategies, no up-to-date information about inventory
Prices for airline tickets or hotel rooms are as unpredictable as British weather: A price for the same room or seat may change several times in 24 hours. Every accommodation or transport provider is trying to sell as much inventory as possible and at the maximum price.
Traveler demand for hotels and flights also depends on seasonality, days and parts of a week, holidays or events. Consequently, with fewer reservations, prices go down as transportation, hospitality companies, online travel agencies, and aggregators are striving to motivate customers to press a “book” button. During high demand periods (think the Christmas season or August as popular vacation times for European travelers), prices are skyrocketing.
Competing for customer attention, the market players monitor each other’s prices, adjusting their price strategies to be ahead of rivals. Not to mention unique approaches to revenue management and pricing strategies.
So, it’s challenging for data scientists to forecast flight or accommodation prices because they can’t learn about each company’s pricing strategy or up-to-date information about their inventory or real demand for specific dates.
Approaches to price predictions: time series forecasting with ARIMA, XGBoost, or RNNs
Despite difficulties, specialists find solutions. The AltexSoft team has developed a Price Predictor tool for Fareboom, a US-based online travel agency, so it can advise price sensitive customers about the optimal time to get the best flight deals. The algorithm forecasts future price changes based on historical data and machine learning models.
The Price Predictor is a search module and a popup window shown to a subset of users. Once travelers provide search data, they see charts depicting whether selected travel dates are cheap or not. Based on the results, users get a recommendation (buy now or take time) along with a forecast of future price changes or alternative trip days.
Fareboom purchasing advice and a prediction on a price change
The final algorithm has an average confidence rate of 75 percent and uses a time series forecasting technique to make both short-term (7 day) and long-term (7 week) forecasts.
Time series forecasting predicts future observations (i.e., fare prices) in time series datasets. These datasets consist of sequences of observations collected with equally spaced periods of time. So, a time series forecasting model analyzes historical data to make predictions about the future.
There are two types of time series forecasting – univariate, the sequence of measurements of a single variable is used, and multivariate, data with numerous time- and co-dependent variables is used.
“Time series forecasting is quite an interesting task which doesn’t have one solution to work best all the time. Different domains and data require different approaches. Sometimes you can use some classical methods like ARIMA [a class of models widely applied for time series data analysis and forecasting]. But much more often a recurrent neural network (RNN) or XGBoost give better accuracy,” says data science competence leader at AltexSoft Alexander Konduforov, who designed the price predictor for Fareboom.
Using XGBoost ensembles. XGBoost is the implementation of the gradient boosted tree algorithms that’s commonly used for classification and regression problems. Gradient boosting is a supervised learning algorithm consisting of an ensemble (set) of weaker models (trees), which sums up their estimates to predict a target variable with more accuracy.
Using recurrent neural networks. An RNN is a neural network type used for the analysis of sequential data like time series, text, video, speech, or financial data. It does the same task for every element of sequential data. A recurrent neural network is special because it “remembers” formation (computations) about input it received, so it can accurately forecast future values.
Alexander notes that time series forecasting is also diverse from the data perspective. “It’s quite a rare case when you have just univariate time series for the forecasting, and it’s enough for making good predictions. Based on our time series forecasting experience, in most cases your target value isn’t solely dependent on the historical values or time features. That means we must find and utilize additional data or engineer new features based on our existing dataset. For instance, in one of our projects, we had good predictions for most of our test set, but some time periods had a much higher error. We did more research and found out which additional factors might have influenced this behavior. After having added them into our model, we fixed those errors and increased the overall accuracy of our predictions,” the data scientist explains.
Hopper is another popular app with price forecasting capability. Hopper assists users in trip planning by recommending the best time to book a flight or accommodation at the lowest cost. Hopper analyzes historical data to forecast future airfare and hotel room prices up to six months in advance, as well as concludes whether a traveler should “buy now” or wait for a better deal. Accommodations and flights can be booked directly via the app.
Airfare price prediction in the Hopper app
It continuously monitors prices and sends alerts when good deals are available, or prices are expected to increase. The company claims that the accuracy of predictions is 95 percent. For now, travelers can search across NY properties. However, 10 additional markets will be available soon.
Google follows the same logic and provides recommendations on the best time for booking airline tickets and forecasts on price movements for selected trip destination and dates within its Flights travel service. “Using machine learning and statistical analysis of historical flight data, Flights displays tips under your search results, and you can scroll through them to figure out when it’s best to book flights,” the company tells in a blog post. That way users can find out whether prices for specific trip dates are higher or cheaper “than normal,” or whether stable fares will decrease or not.
Tips on Google Flights. Source: Google blog
Those who search for hotels using the search engine may see similar tips about room rates. Users can also find out whether a particular area tends to be busier than usual due to upcoming festivals, conferences, or holidays.
Given the examples above, one can conclude that price prediction solutions in the travel and hospitality industry are only beneficial for end customers. But companies that provide this service can also benefit because price forecasts increase user engagement.
Real estate: predicting property prices for agents, investors, and buyers
Global real estate investment market keeps growing. According to the latest Real Estate Market Size Report by Morgan Stanley Capital International (MSCI), the market grew by 15 percent, from $7.4 trillion in 2016 to $8.5 trillion in 2017. A multitude of global factors and their interrelationship with each other influences the market, which leads to price fluctuations.
Factors influencing demand and prices for real estate: economic and political situation, interest rates, climate change, commodity prices
Economic health. Real estate price correlates with the overall health of an economy. Such economic indicators as the gross domestic product (GDP), manufacturing activity, the consumer price index (CPI), employment and unemployment rates are used to evaluate the state of the economy. For instance, in areas or countries with rising unemployment rates, purchasing power falls, as do property values.
Interest rates. Since many entrepreneurs and consumers can’t pay upfront for a property, mortgage/interest rates area a major influence on prices for these assets. When interest rates drop, purchase power increases. A growing demand for real estate then puts upward pressure on prices.
It’s worth mentioning the US housing bubble of 2007 in this context. Because of very low interest rates, affordable credit, and speculation, demand and prices skyrocketed. Eventually, demand started decreasing while supply continued to grow, and prices plummeted. From 2007 to mid-2010, housing prices dropped more than 30 percent.
Political turmoil. Political instability is another factor that makes foreign and international investors hesitate purchasing these fixed assets. As a result, sellers must drop prices. For instance, house prices in London decreased 0.7 percent from the beginning of 2018 to June 2018 due to uncertainties connected with Brexit. At the same time, the situation may be different in other parts of the UK. Real estate values across the UK continued to grow: Prices for homes in Scotland increased by 4.8 percent.
House price changes in 2018 across UK. Source: Financial Times
Climate change. Climate and severe weather changes are some of the environmental aspects to which entrepreneurs pay attention when evaluating whether it’s economically feasible to add a particular property into their portfolio. Such risks may negatively affect the investment attractiveness and therefore the value of real estate assets. Authors of the ESG Trends to Watch in 2019 report from MSCI estimate that prices for real estate located in coastal areas with risk of floods may lag or drop compared with property values in less flood-prone inland zones.
In addition, prices for construction supplies and commodities may add weight to housing costs.
Other influencing factors. Besides major trends and varied aspects impacting property value, a number of characteristics (features, attributes) and local factors define the cost of a property with specific location and general area as the main ones. As the old saying goes, “There are three things that matter in real estate: location, location, and location.” Certainly, the number of bedrooms, construction quality, kitchen appliances, and distance to public transportation, shops, restaurants, wellness centers, parks, hospitals, etc., may all affect the prices.
To sum up, realty value may depend on global and local factors influencing the real estate market and its more specific attributes.
Challenges of real estate price forecasting: human factor, bad data quality
Unfortunately, some factors remain unpredictable, no matter which techniques specialists use. And it’s human behavior that complicates forecasting.
Mark O’Neill, a product manager of REALas (acquired by the ANZ Banking Group), the Australian startup providing price forecasting services for homebuyers, notes that the human element of the market is one of the challenges the project team deals with. “We can’t predict the variation and emotional side of home buying. Personal situations of the seller, the buyer, and the other parties at the auction can play a huge part in the final selling price. To solve this, we try to incorporate as many proxies [indicators] as we can for demand and supply factors. One example could be that if a house has been on sale for over 9 months, there’s a high chance that it won’t sell above market value.”
Another significant pain point is poor data quality, adds Mark: “There is no single source of truth for property data and many inputs are based on manually, often incorrectly, entered data at the source. We don’t know if a house has been renovated, the land size or sale price was entered correctly. This aspect requires us to spend a lot of time cleaning and engineering our data and models to make sure our predictions don’t learn from bad inputs.”
Sellers may also forget to update property prices in online marketplaces or set them below market value to find new inhabitants faster. So, there may be different scenarios in which sellers could provide data that doesn’t reflect the actual state of things in the market.
Approaches to price predictions in real estate: regression tree ensembles show the best results
Price predictions for residential properties with ML. REALas predicts prices for “approximately 90 percent” of residential properties that are currently on sale across Australia. So, we’re not talking about long-term predictions. Users need to enter a zip code, a suburb, an address, or numerous details at once to see properties with estimated prices on a map.
The service doesn’t cost a dime for buyers, sellers or agents, notes Mark. “Our data comes from a supplier that has access to a range of real estate portals and data which we can use to provide predictions for free. Our mission is to provide unbiased information to home buyers to help them purchase a home and we can’t do this if we are working for the vendor side of the market.”
Predictive models powering the solution analyze a wide range of pricing data and fluctuations, such as trends of areas, property types, and other market factors. “As Australia is so large and diverse, you could argue that each state is a market in itself, and each of these markets behaves differently. Our models have to delineate between changes and trends in a state and a region,” adds O’Neill.
The service predicts prices for houses on sale and provides basic information about properties. Source: Avocette
The expert stresses the importance of feature engineering for building models that aren’t too complex but yet able to provide accurate results. “When it comes to housing markets, there are so many factors and trends to consider that affect the price of a property, and we have to be careful with how many we incorporate in our models. In many cases, one feature could give you the same value as 20 others combined, with much less noise.”
To ensure that predictions reflect market changes, data scientists retrain, test, and redeploy models to remain up to date with current conditions in each area.
Here is another example of how machine learning techniques can be applied to estimate or predict prices of individual properties with the goal of evaluating their investment attractiveness. Researchers from Spain have built predictive models using four different techniques (ensembles of regression trees, k-nearest neighbors, support vector machines for regression, and multi-layer perceptrons) to find out which model architecture shows the best accuracy.
The authors used listing data about properties located in one of Madrid’s districts gathered between July 1 and December 31, 2017. Attributes of real estate assets were known.
Through model training and evaluation, scientists found out that models comprised of regression tree ensembles predict prices with the highest accuracy rate. “In quantitative terms, we have found that the smallest mean absolute error is €338,715, and the best median absolute error is €94,850,” say the researchers. While these errors can be considered high in terms of financial investment, they are relatively small given the fact that listing data includes only properties that cost over € 1 million. “When the mean and median absolute errors are compared with the mean and median of the distribution of prices, relative errors of 16.80 and 5.71 percent are obtained, respectively.” The authors also note that the following errors prove that using more complex ML algorithms is better than a linear regression model because the errors are substantially smaller.
Median absolute error with different model implementations. Source: Applied Sciences
The authors suppose that such a great difference between mean and median absolute error can be caused by outliers in data – values that deviate significantly from the rest of the distribution. Data scientists therefore should put much time and efforts into preparing training datasets to get more qualitative models thereafter.
As one of the suggestions for future work, the researchers think it could be interesting to use time series data for modeling since it can greatly enhance a model’s prediction performance.
All market participants would use price forecasts to make informed decisions. Developers and investors can evaluate expected return on investment into assets, potential landlords can choose an appropriate purchase time, find a property with characteristics (area, size, etc.) that meet their needs. Property appraisers can use predictions on future prices to decide whether to inform mortgage lenders about price trends (falling, being stable, or rising) for houses in particular neighborhoods. Real estate agents representing sellers or buyers, and property sellers themselves may also benefit from price forecasts.
Stock price forecasting: controversies and attempts
There is no exact answer to the question of whether machine learning is an effective technique for stock price prediction. Some traders noted that ML is useful for automated trading. For instance, machine learning may help users to identify trending stocks or to define how much budget to allocate for stocks. However, these algorithms may fail in predicting stock prices. But still, data scientists are looking for techniques that can provide solid forecasting results.
Factors influencing stock exchange prices: a company’s performance and prospects, inflation, trends, economic and political situation, and others
A multitude of aspects influences stock prices. These factors belong to three groups: technical factors, fundamental factors, and market sentiment.
Fundamental factors. Fundamentals describe a company’s performance and expectations about its future development. Such measures as earnings per share [the amount of profit allocated to each share of common stock], dividends per share, and cash flow per share are used for evaluation of current company profitability. Expected growth in the earnings base and the discount rate [used to define a current value of the future stream of earnings] that indicates risk and inflation are used to estimate a company’s future prospects.
Technical factors. These are external conditions on which supply and demand for a company’s stock depend. The strength of the market and its players, inflation and deflation (may cause a decrease in stock prices), economic and political situations, demographics, trends, and liquidity must be considered when predicting stock price movements. Prices and demand for substitutes (other groups of securities like government and corporate bonds, foreign equities, real estate, or commodities) and incidental transactions are also influential factors.
Market sentiment. Market sentiment represents the psychology of the market players (on both collective and individual levels). Market sentiment the study subject of behavioral finance, an area of behavioral economics. Particularly, the behavioral finance experts study psychological biases (mental shortcuts) causing irrational investment decisions that, in turn, can cause rises and drops in stock prices. The housing bubble we mentioned earlier is a consequence of such illogical investment decisions.
Approaches to stock price prediction
Behavioral finance proposes the Efficient Market Hypothesis (EMH), according to which the price of a stock reflects all information available and it’s always traded at a fair price. Making price predictions on stock market, you basically agree with this disputable hypothesis, as you have to analyze open data sources and rely on the assumption that these sources impact stock prices.
Researchers from the University of California, Berkeley, studied the relationship between the changes in weekly stock prices and news/events from online sources. They combined time series analysis with information from Google Trends and the Yahoo Finance websites to forecast stock prices. They used both fundamental and technical five-year data on a stock prices of Apple Inc (from the first week of September 2007 to the last one of August 2012). “Our fundamental data is in the form of news articles and analyst opinions, whereas our technical data is in the form of historical stock prices,” say the data scientists.
After having applied the ARMA model for time series analysis, the researchers proposed the algorithm to analyze online news related to AAPL stock that “can potentially outperform the conventional time series analysis in stock price forecasting.”
Another research group shared their findings on the 15th Conference on Dependable, Autonomic and Secure Computing. They suggest using StockTwits, a social media platform for investors, to draw predictions based on sentiment analysis and such factors as author’s likes, follower count, and previous conclusions about stock changes. The models they’ve built choose the most relevant stock price prediction posts and draw forecasts from them. This interesting technique managed to achieve about 65 percent accuracy on average. Other attempts considered using financial data only for short-term (15-30 day) forecasts for stable stocks that could potentially yield about 4.35 percent gain.
However, stock price forecasting is still a controversial topic, and there are very few publicly available sources that prove the real business-scale efficiency of machine-learning-based predictions of prices.
Price prediction may be useful for both businesses and customers. It can facilitate decision-making in everyday operations and/or long-term planning. Also, price forecasting tools motivate users to engage with a brand or evaluate offers to spend their money wisely.
Price forecasting requires a data analyst or scientist to acquire domain knowledge: They must understand what factors drive demand for products, commodities, or services. These factors may include seasonality, holidays, the intensity of daily and weekly activities, the political and economic situation in a country or region of interest, weather and climate changes, infrastructure maintenance costs, and many others. It’s also important to know what marketing, revenue management, or bidding techniques market players use.
Another pillar of success is up-to-date and high-quality data. Specialists must collect enough data to build, train, and test predictive models with, as well as develop and maintain overall data management strategy. Therefore, the choice of method and techniques depends on the type of data.