Travel Search Caching: How OTAs Scale to Millions of Users

Online Travel Agencies (OTAs) operate under relentless pressure to deliver near-instant search results—a key driver of user engagement and customer conversion. Behind every millisecond response for a weekend in Rome, Seoul, or Istanbul lies a complex technical challenge: how to balance the high cost of API calls with the volatility of prices that, in some cases, can change in seconds. Caching sits at the center of this trade-off.

In this article, we share AltexSoft’s experience and perspective on how caching works on booking platforms. While implementation details may vary from company to company, the underlying principles remain largely the same throughout the travel industry.

Why OTAs need caching

In travel distribution, relying entirely on real-time API calls becomes both technically complex and expensive at scale. That’s why caching —a data storage layer that temporarily keeps frequently accessed information—became a core architectural component of OTA platforms, helping them address the operational pressures of travel search and booking.

Latency. Behind a seemingly simple query, such as a hotel search in London, lies a cascade of requests to global distribution systems (GDSs), bed banks, and direct suppliers, each with its own latency profile. Delays from any of these sources can slow response times, forcing the platform to either wait longer or return only partial results. At the same time, users expect near-instant responses, and anything beyond two seconds can significantly increase bounce rates. Caching allows the OTA to return an immediate result set from local storage while refreshing live data progressively

High cost of search requests and look-to-book (L2B) constraints. As Ihor Protsenko, Product Manager at AltexSoft, explains, “Every time a user clicks search, an OTA pays money just to generate a listing page.” While not every supplier charges a flat fee per API request, serving a search often requires querying dozens of external systems simultaneously. Each request generates direct infrastructure costs and forces the OTA to parse, normalize, deduplicate, and rank large volumes of fragmented supplier data in real time.

In addition, access to supplier systems—particularly in the airline industry—is often governed by look-to-book (L2B) ratios, which measure how many search requests (“looks”) convert into actual reservations (“books”). Excessive search traffic combined with low conversion rates can result in higher fees, throttling, or other commercial restrictions imposed by suppliers. As a result, OTAs must carefully control how often they request live data while still providing users with accurate and up-to-date search results.

Price volatility. Flight seats and hotel rooms are perishable inventory—their value disappears once an aircraft departs or a hotel night goes unsold. As a result, suppliers—especially airlines—continuously adjust fares to maximize yield. As Protsenko notes, “Flights are a segment where prices and availability change far more dynamically than in other travel products, whether cruises, hotels, condos, or car rentals.”

This creates a fundamental trade-off for OTAs: Caching reduces latency and infrastructure overhead, but the longer data remains stored, the greater the risk of displaying outdated prices or availability.

The art of caching in travel is, therefore, not simply about keeping information readily available. It is about finding the right balance between freshness, speed, and cost.

How caching works

When a user enters a destination and dates, an OTA platform first consults the cache layer. From there, one of two things happens.

The basic caching logic in OTAs

Cache hit: If the requested data is already in the cache, the system retrieves it directly rather than querying the original data source. As a result, prices, availability, or other travel information can be displayed in milliseconds.

Cache miss: If the data is missing, outdated, or expired, the OTA sends a live request to the supplier or other content source. Once the updated data is returned, it is displayed to the user and simultaneously stored in the cache for reuse in future searches.

The transition between a cache hit and a cache miss is not accidental: It is governed by invalidation strategies.

Cache invalidation strategies

Large OTA caching systems almost always combine several invalidation approaches rather than relying on a single one.

Time-to-Live (TTL). TTL is a basic expiration mechanism that defines how long cached information remains valid. However, not all travel data is treated equally.

Static content, such as hotel descriptions, images, and amenities, may remain cached for hours or even days.
Dynamic content—prices and availability—can change in minutes or seconds.

TTL also depends on factors such as demand levels and supplier type. Hotel inventory is generally less volatile than airline seats. Besides, TTL may be shortened during peak seasons, for limited inventory, popular destinations, or same-day bookings.

Because fixed timers alone are often too rigid for highly dynamic travel data, OTAs commonly supplement TTL with additional invalidation triggers.

Event-driven or proactive invalidation. Instead of waiting for a TTL period to expire naturally, the system reacts to specific triggers that can come from two distinct directions.

Internal signals (the OTA's own ecosystem). The most common internal event is a completed booking. The moment a traveler successfully reserves a room or flight directly on the OTA platform, the system instantly invalidates that specific inventory in its own cache, preventing the next user from seeing a "ghost" room/seat that was just sold.
External signals (the supplier's tech stack). Some modern direct connects, channel managers, and hotel distribution platforms can notify OTAs of inventory or pricing changes via webhooks and other event-driven integrations, invalidating stale cache entries immediately. However, many supplier connections still rely on periodic updates, such as hotel ARI (availability, rates, inventory) feeds or airline fare data distributed through ATPCO.

Regardless of the trigger, OTAs often rely on tag-based invalidation to identify which cache entries should be refreshed or removed.

Tag-based invalidation. A single hotel may appear in thousands of cached searches (“Hotels in London,” “Family London hotel," “Hotels near Soho," "Hotel in London for 20-30 July,” etc.). The same relates to airlines. So rather than tracking and clearing each cache entry individually, the OTA attaches tags such as

a hotel ID,
a destination ID,
a route, or
an IATA code.

Then, if something changes—for example, a hotel sells out, or an airline updates fares—the system invalidates only cache entries associated with that tag—all at once.

Only entries associated with an updated tag are invalidated

With all three approaches implemented, a real OTA flow may look like this.

A hotel search for “London hotels July 20–30” is cached with a TTL of 5 minutes. The cached result may contain offers from hundreds of hotels.
Individual hotel offers are tagged with metadata such as hotel ID, destination, stay dates, room type, and supplier source (hotel CRS, channel manager, bed bank, other OTA, etc.).
A hotel updates availability through its CRS or channel manager before the TTL expires.
The OTA receives a real-time inventory update event and invalidates only the cache entries associated with the affected hotel inventory, rather than discarding the entire cached search result.

An invalidated cache entry does not necessarily result in a cache miss. An OTA may apply additional latency-reduction strategies to reduce the delays associated with live supplier requests.

Latency-reduction strategies

One of the most common approaches is the Stale-While-Revalidate (SWR) strategy. With SWR, an expired cache entry does not immediately become a cache miss. Instead, the OTA may temporarily serve a slightly outdated response while asynchronously refreshing prices and availability in the background. This allows the system to avoid blocking the search experience while still updating the cache for future requests.

SWR works best for relatively stable travel information, such as destinations, amenities, reviews, or hotel bookings made well in advance. It becomes much riskier for highly dynamic inventory like airline fares—especially last-seat offers—or same-day hotel reservations, where prices and availability can change in minutes or even seconds.

In those cases, a traveler may click on a $400 flight only to discover that the fare has already increased to $450, or that the last available room has just been booked.

For highly volatile data, the OTA may proactively refresh the cache immediately after invalidation, ensuring that prices and availability remain up to date before the next request arrives.

Pre-warming is another latency-reduction strategy. Instead of waiting for travelers to initiate a search, the OTA populates the cache with high-demand inventory based on historical patterns, forecasts, or current trends. This increases the likelihood that popular searches can be served directly from the cache.

Both proactive refresh and pre-warming typically focus on high-demand inventory, such as

high-traffic flight routes,
major destinations,
trending hotels,
holiday periods, and
predictable temporal spikes, such as Thursday/Friday departures and Sunday/Monday returns.

Large OTAs may combine pre-warming with predictive caching, where machine learning models estimate which requests are most likely to occur. Based on the outputs, the OTA pre-fetches real supplier data and stores it in cache before the user searches.

A related approach is price forecasting. Ihor Protsenko recalls his work at a large OTA, where he helped build predictive price calendars based on a year of historical airfare data.

“Instead of querying suppliers for a wide range of dates, we estimated prices using machine learning,” Ihor says. “By filtering out extreme outliers—unusually low or high fares—we were able to present data that closely reflected reality. This gave users a clear sense of when to expect lower fares and when it was a good time to book.”

AltexSoft, in turn, developed a similar price predictor for FareBoom, an OTA specializing in low-cost international airfare. Built on top of the company's existing travel booking engine, the tool complemented the core search functionality and helped price-sensitive travelers make better purchasing decisions by showing the most favorable periods to buy tickets over the coming months.

While price prediction is not a caching mechanism, it addresses a similar business challenge: reducing the number of expensive real-time supplier requests needed to answer exploratory search questions. Rather than querying airlines for every possible date combination, the system can provide fast, reasonably accurate estimates based on historical patterns and reserve live API calls for the moments when users evaluate or book specific itineraries.

How all the components team up

What does the caching process look like in practice?

The answer depends on how much inaccuracy a particular user experience can accommodate. As Ihor Protsenko explains, “It's fine to show some inaccurate availability or price on the landing page for a user, but it's completely not fine to show that on the listing page where users can click a book button and proceed to booking flow and payment.”

Data and requirements to its accuracy at different stages of the traveler journey

In other words, discovery content—such as landing pages, destination pages, and price calendars—can be populated with recently observed, historical, forecasted, or “starting from” prices that may not reflect currently bookable inventory. Stages closer to the booking transaction require fresher, validated data and stricter cache controls.

At the same time, OTAs must also consider the characteristics of the underlying inventory and pricing data, such as demand, volatility, and the cost of retrieving fresh data from suppliers.

The exact caching strategy, therefore, depends on both the traveler’s position in the booking journey and the nature of the data being served. While real OTA caching architectures are far more complex than what can be fully described in a single article, the overall logic may work roughly as follows.

Pre-warming. The OTA, often using predictive models, proactively fills the cache ahead of expected demand.

TTL. Expiration rules ensure cached data does not remain valid indefinitely. For highly volatile inventory, TTLs are shortened aggressively.

Proactive invalidation. Internal or external signals can invalidate affected cache entries before their TTL expires.

Proactive refresh. For high-demand or highly volatile inventory, OTAs proactively refresh cached data to keep it available and reduce the likelihood of cache misses.

Reactive refresh (SWR). For low-demand and relatively stable inventory, the system waits until the next traveler request. If cached data becomes slightly stale but is still considered temporarily acceptable, the OTA serves it while asynchronously refreshing prices and availability in the background.

Cache miss. If data is invalidated, unavailable, or too stale, the OTA performs a live supplier request, reintroducing latency in exchange for accuracy.

In other words, pre-warming, proactive refresh, and SWR help reduce cache misses, while invalidation mechanisms keep stale data under control. Even so, OTAs still recheck price and availability before checkout—even if a live supplier request was already made during search.

So far, we have focused on how OTAs keep cached data fresh and available. But there is another constraint: Cache storage is finite. As new inventory enters the cache, older data must eventually be removed to free up space. Determining which cache entries should be retained and which should be evicted is the role of cache replacement policies.

Cache replacement policies: What happens when the cache is full?

OTAs balance several standard cache replacement policies to maximize the value of limited storage.

Least Recently Used (LRU) removes entries that have not been accessed for the longest time. The assumption is that searches travelers viewed recently are more likely to be requested again.

Most Recently Used (MRU) removes the most recently accessed entries. While less common than LRU, it can be effective in some workloads. For this reason, some travel platforms, including Amadeus, support MRU alongside LRU and FIFO as configurable cache replacement strategies.

Least Frequently Used (LFU) removes entries that receive the fewest requests over time. This approach helps retain highly popular searches, routes, and destinations that consistently generate traffic.

First In, First Out (FIFO) removes the oldest entries from the cache, regardless of how often or when they were last accessed. The approach is simple but may result in the eviction of valuable inventory that remains popular among travelers.

These algorithms are often supplemented with travel-specific business logic. Factors such as demand patterns, time to departure, inventory volatility, regeneration cost, and expected booking value can influence which entries are retained and which are evicted when cache capacity becomes constrained.

Cache tiering

Large travel platforms often rely on multiple caching layers, each optimized for a different balance of speed, freshness, storage cost, and traffic volume.

Where different data is cached

Level 1 (hot) cache is a high-speed in-memory system designed for extremely frequent and latency-sensitive searches that can return results within milliseconds. Because this type of storage is expensive, it contains only a relatively small subset of the most popular and recently accessed inventory, such as high-traffic flight routes (e.g., JFK–LHR on a Friday evening) or heavily searched hotel destinations, often using very short TTLs.

This layer commonly relies on in-memory technologies such as Redis, Apache Ignite, or Memcached.

Level 2 cache acts as a larger and slower storage layer for lower-frequency searches, less popular routes, older cache entries, or inventory that does not justify occupying premium in-memory resources. L2 caches in OTAs often use distributed NoSQL databases such as Couchbase, Cassandra, or DynamoDB, which are optimized for scale, durability, and large data volumes rather than ultra-low latency.

Edge (CDN) cache is the outermost caching layer, where content delivery networks (CDNs) store frequently accessed static and semi-static content closer to travelers and serve it from geographically nearby edge locations. This may include hotel images, destination pages, amenities, reviews, and frontend assets such as CSS/JavaScript files. By avoiding repeated round-trips to the primary data center, edge caching significantly reduces latency and infrastructure load.

Common CDN solutions in OTA architectures include Cloudflare, Akamai Technologies, Fastly, and Amazon Web Services CloudFront.

How major players optimize caching

Large travel platforms operate sophisticated caching infrastructures behind seemingly simple hotel or flight searches. Much of this architecture remains proprietary, competitive know-how. Still, companies occasionally reveal fragments of their approaches, providing insight into how travel platforms balance performance, costs, and data freshness at a massive scale.

Expedia: Reducing flight search latency by 95 percent with Apache Ignite

Flight search is one of the most demanding workloads for online travel agencies. A single route can generate thousands of possible itineraries. For example, 90 outbound flight options combined with 80 return options already produce 7,200 combinations, while popular routes such as Seattle to Las Vegas can exceed 20,000.

To reduce the latency associated with live supplier requests, the world’s second-largest OTA Expedia, relied on a caching layer powered by Apache Cassandra. The system could retrieve compressed flight combinations for a search in roughly 30 milliseconds. However, unpacking the cached results took more than 3 seconds before the page could be rendered to the user.

To bring flight search response time below 2 seconds, Expedia evaluated Apache Ignite, a distributed in-memory platform designed for super-fast data processing.

Ignite fundamentally changed the caching workflow. Expedia began storing more granular flight solutions across distributed tables. This allowed the system to unpack only the necessary portions of the data instead of processing the entire payload.

Another major optimization came from moving computations closer to the data itself. Ignite executed processing directly on the server grid, reducing unnecessary network traffic.

The results were dramatic: Overall response time dropped from more than 3 seconds to roughly 150 milliseconds—a 95 percent reduction in latency.

Expedia also implemented proactive cache warmup techniques. Before users initiate searches, the system launches preemptive results for popular routes.

Skyscanner: supporting cache durations from 36 hours to 10 minutes

Skyscanner, a global travel metasearch platform, uses two separate caching layers for flight searches.

The DayView Cache powers the initial search results page and relies on TTL-based rules depending on departure proximity. Default settings are

36 hours for searches that return no flights,
8 hours for flights leaving later than a month ahead,
4 hours for flights leaving within a month, and
1 hour for flights leaving within a week.

These cache durations can be adjusted for individual partner suppliers depending on factors such as API limitations, price volatility, or look-to-book requirements.

Skyscanner has also started implementing a dynamic caching system that automatically adjusts TTLs based on factors such as price volatility and search relevance.

The booking panel cache operates later in the user journey, when travelers open detailed offers from airlines and OTAs before being redirected to complete the purchase. Because users are much closer to conversion at this stage, the cache TTL is significantly shorter— only 10 minutes. If a cached price is older than that, Skyscanner makes a fresh API call to the supplier. Additionally, if a traveler remains on the booking panel for longer than 10 minutes, the platform refreshes prices from all suppliers displayed in the results.

Amadeus: absorbing enormous traffic spikes with a multilayered caching mechanism

Caching is essential not only for OTAs but also for the global distribution systems (GDSs) that power much of the travel industry. One of the largest GDS providers, Amadeus, shared how its engineering team handled traffic spikes on its Cars platform.

To absorb these surges, Amadeus introduced a two-layer caching architecture between the application and the core Oracle database.

The first layer is a configurable in-memory cache designed for speed. It stores the most relevant data and queries Oracle only when information is missing or outdated. Engineers can adjust cache size, TTL values, and replacement policies. Because the cache runs entirely in memory, it delivers extremely fast responses at virtually no hardware cost. The tradeoff is limited storage capacity and weaker consistency guarantees.

The second layer relies on Couchbase. Unlike the in-memory cache, it is designed primarily for scalability and stores a much larger volume of data that is periodically synchronized with Oracle. This layer provides stronger consistency but comes with additional infrastructure costs and some network latency.

The combination proved highly effective. Amadeus reported that the multilayer caching system stabilized database traffic and achieved a cache hit ratio of about 85 percent. As a result, CPU utilization remained much more predictable even during demand spikes. Most importantly, the Cars platform was able to handle traffic peaks up to 50 times higher than normal while reducing response times by roughly 50 percent.

How to measure caching success

Caching is not something OTAs configure once and forget. As Ihor Protsenko notes, “Engineering teams who own the caching logic constantly fine-tune it to achieve better performance.” To measure the impact of these changes and identify further optimization opportunities, engineers rely on a broad set of metrics. Here are some of the most important KPIs OTAs monitor on a daily basis.

Cache hit ratio measures how often a system can serve a request from the cache instead of fetching data from the original source, such as a supplier API, database, or GDS. A higher cache hit ratio typically translates into lower latency, reduced infrastructure load, and fewer supplier requests. For example, Amadeus reported a cache hit ratio of roughly 85 percent for the two-layer caching architecture used in its Cars platform.

Search latency is the time it takes to return search results after a traveler submits a query. OTAs may supplement average latency with percentile-based metrics, which show how quickly the vast majority of searches are completed. For example, a P95 latency of 2 seconds means that 95 percent of searches are returned within 2 seconds. This helps uncover slow searches that averages can hide.

Price accuracy assesses how closely the prices shown during search match the prices available at booking or checkout. This metric helps OTAs quantify the trade-off between speed and freshness. Significant discrepancies may indicate overly aggressive caching or insufficient refresh and invalidation mechanisms.

Supplier request costs track how much an OTA spends retrieving inventory and pricing data from external partners. Since search traffic can generate far more supplier requests than bookings, caching plays a critical role in reducing these expenses and improving search economics.

“Caching is always dynamic,” Ihor sums up. “You need to look at multiple metrics over time, understand how they change under different scenarios, and evaluate them together rather than in isolation. Taking this helicopter view gives you a much clearer picture of how effective your caching strategy is.”

With 25 years of experience, Liudmyla is a seasoned editor and IT journalist. Over the last five years, she has focused on travel tech, travel payments, and the advancements in NDC implementation.

Want to write an article for our blog? Read our requirements and guidelines to become a contributor.

Caching Strategies in OTAs: How to Return Results in Milliseconds