Caching: System Design Concept

Have you ever experienced a website taking longer to load on the first visit but becoming significantly faster on subsequent visits? This is a common phenomenon related to caching - a critical concept in system design. In this blog, we will discuss how caching helps us to enhance the performance of a system.

An analogy to understand caching!

Suppose there is a library with thousands of books and a librarian who helps students find and borrow books. When a student arrives and requests a book, the librarian goes to the storeroom to retrieve it, brings it back to the desk, and issues it to a student. When a student returns the book, the librarian puts it back in its place in the storeroom. If another student arrives and requests the same book, the librarian goes to the storeroom again to retrieve it. In this system, the librarian must visit the storeroom every time a student requests a book.

Now, let's consider a scenario where the librarian has a bag that can hold 15 books, serving as a cache for recently returned books. When a student arrives and requests a book, the librarian goes to the storeroom and retrieves it, just as before. But when the student returns the book, the librarian places it in the bag instead of returning it to the storeroom. If another student requests the same book, the librarian checks the bag and finds the book there. This means the librarian does not have to go to the storeroom again, and the student can be served more efficiently.

What is caching?

Caching is a technique for improving the speed of data retrieval by storing data or the results of a request in a temporary storage location called cache. Cache is high-speed data storage that stores a small proportion of critical data, specifically the most frequently requested data.

The purpose of caching is to make data retrieval more efficient by reusing previously retrieved data rather than retrieving it again from a slower storage (Disk or RAM). This will improve system performance because time required to retrieve data get significantly reduced.

How caching works?

  • When a user requests some data, the system will first check that data in the cache memory. If the data is present in the cache (cache hit), the system will return the data directly from there. This process is much faster compared to accessing data from the original database.
  • When the requested data is not present in the cache (cache miss), then the system delegates this request to retrieve that data from the original source database. This process is slower than a cache hit because source database access is slower compared to cache access (slower due to factors such as disk I/O or network latency). In this case, the system will also store the fetched data in the cache for future use so that subsequent requests for the same data can be served more quickly.

Cache hits and misses are critical metrics for measuring the performance and effectiveness of a cache. A high cache hit rate means cache is effectively storing and retrieving the most frequently accessed data. On another side, a high cache miss rate suggests that the cache is not being used effectively. So we need to adjust the cache size or use some other replacement policy.

What is caching in system design?

Now there are several questions to think about and explore:

  • What type of data should be stored in the cache?
  • What would be an appropriate size for the cache?
  • What should be done when the cache memory becomes full?
  • Which data must be removed to make space for new data in the cache?
  • What are the various types of caching strategies used in system design?

Real-life examples of caching

  • Web browsers use caching to store frequently accessed HTML, CSS, JavaScript, and images. This allows browser to quickly retrieve these resources, rather than retrieving them from the server each time.
  • Content Delivery Networks (CDNs) store static files like images and videos and serve them from locations closer to the user. This reduces the time it takes to retrieve data and reduces latency.
  • DNS caching helps to improve the speed of accessing web pages by storing IP address of a domain name in a cache. In other words, instead of performing DNS query every time a user accesses a website, IP address can be quickly retrieved from the cache.

Different types of caching used in system design

We can use several types of caching to improve performance of a system.

Browser caching

If you're wondering how websites load quickly and efficiently, one of the reasons is browser caching. This involves temporarily storing resources such as images, HTML, JavaScript files within a cache in web browser.

  • When you revisit the same website, the browser will retrieve these resources from the cache, rather than downloading them from the network again. This will result in a faster page load time and reduce the amount of data that needs to be transmitted. This type of caching is also known as client-side caching.
  • The browser cache has limited capacity and is set to store resources for a specific time duration. When cache reaches its capacity or resources reach their expiration date, browser will automatically clear the cache and retrieve updated resources from the network during the next visit.
  • Users can also manually clear their browser cache if it becomes full. This will help to ensure the most up-to-date version of the website.

Web server caching

Web server caching is used to improve performance of a website by storing resources on the server side. This reduces load on the server. There are several ways to implement web server caching i.e. reverse proxy cache and key-value store such as Memcached or Redis.

  • Reverse proxy cache acts as an intermediary between browser and web server. When user makes a request, reverse proxy cache checks if it has a copy of the requested content. If it does, it serves the cached version to the user rather than forwarding request to the origin server.
  • Key-value stores can cache web content desired by the application. These stores are typically accessed by application code or application framework. Unlike reverse proxies, which cache HTTP responses for specific requests, key-value stores can cache any user-specific data or frequently accessed data that application developer desires.

Content Delivery Network (CDN)

Content Delivery Network (CDN) is a system that are designed to improve delivery speed of static content, such as web pages, images, videos, and other media files. These proxy servers are located in strategic locations around the world to reduce distance between end user and origin server (reducing latency).

  • When user requests content from a website that uses a CDN, CDN fetches the content from origin server and stores a copy of it.
  • If user requests the same content again, CDN serve content directly rather than fetching it again from the origin server.

Think CDN like a chain of grocery stores: Instead of going all the way to farms where food is grown, which could be hundreds of miles away, customers can go to their local grocery store. Grocery store stocks food from faraway farms, allowing customers to get the food they need in a matter of minutes, rather than days.

Distributed caching

Distributed cache is a system that pools together random-access memory (RAM) of multiple networked machines into a single in-memory data store that is used as a cache. Unlike traditional caches, which are usually limited to the memory of a single machine, distributed cache can scale beyond the memory limits of a single machine by linking together multiple machine (distributed cluster) to increase capacity and processing power.

What is distributed caching in system design?

Distributed caches are useful in environments with high data volume and load, as distributed architecture allows for incremental expansion by adding more machines to the cluster. This helps cache to grow with the data growth and handle large amounts of data efficiently.

Database caching

Database caching is a technique that store frequently accessed data in a cache to improve performance of a database. This can significantly increase throughput and reduce data retrieval latency, resulting in improved overall performance for the application. A database cache layer can be implemented in front of any database, including relational and NoSQL databases.

Cache Eviction Policy

Cache eviction policies are algorithms or strategies that manage data stored in a cache. When cache is full, some data needs to be removed in order to make room for new data. So cache eviction policy determines which data to remove based on certain criteria.

There are several common cache eviction policies:

  • Least Recently Used (LRU): Removes data that was least recently used.
  • Least Frequently Used (LFU): Removes data that was least frequently used.
  • Most Recently Used (MRU): Removes data that was most recently used first.
  • Random Replacement (RR): Randomly selects a data item and removes it to make space.
  • First In First Out (FIFO): This algorithm maintains a queue of objects in the order that they were added into the cache. When a cache miss occurs, one or more objects are removed from the head of the queue, and a new object is inserted at the tail.

What is Cache Invalidation?

When data in the database is constantly being updated, it is important to ensure that cache is also updated to reflect these changes. Otherwise, application may serve outdated or stale data to clients. So, we use cache invalidation techniques to maintain consistency of the cache with latest changes in the database. There are three different types of popular cache invalidation schemes:

Write through cache

In write-through caching, writes are first made to the cache and then to the database. If both writes are successful, write operation is considered successful. This approach ensures that cache and database remain consistent and reduces the risk of data loss in case of a crash or system disruption.

  • Write-through cache comes with the trade-off of increased write latency. This is because data must be written to two separate places, which can take longer time. This may not be a problem for applications with small amounts of write operation, but it can become an issue for applications with heavy write operations.
  • Write-through caching is best suited for applications that frequently re-read data. Despite the increased write latency, this approach offers lower read latency and consistent data, which can compensate for the longer write time.

Write around cache

In write-back cache, we write data directly to database and bypass the cache. This leads to increase in cache miss, resulting in higher read latency for applications that frequently write and re-read data.

  • We use this approach when we update data frequently and there is a high overhead for cache update. In such cases, we only update cache when the data is read. This will reduce number of write operations and the risk of cache overloading.
  • This approach may not be suitable for applications that frequently re-read the most recent data. This is because cache misses will occur frequently, which can cause slower read times.

Write back cache

The write-back cache is used in systems with high write activity to improve write performance. Writes are temporarily stored in a cache layer, where they are quickly verified and then asynchronously written to the database. This approach results in faster write latencies and higher write throughput.

However, this technique carries the risk of data loss if cache layer fails, as cache is the only copy of the written data. To minimize this risk, it is recommended to have multiple cache replicas that acknowledge the write. This way, if one cache fails, data can still be recovered from another cache.

Advantages of Caching

There are several benefits to using caching in a system or application:

  • Improved performance: Caching significantly improves overall performance of the system by reducing the need to download data each time it is needed.
  • Reduced database cost: Caching can reduce load on the primary database by offloading traffic to the cache server. This can significantly reduce the overall cost of the system, especially if the primary database charges per throughput.
  • Reduced backend load: By redirecting significant parts of the read load from backend database to in-memory cache, load on the database can be reduced.
  • Predictable performance: Caching can help mitigate unpredictable performance during high traffic or spikes in usage by providing high throughput in-memory cache to handle the increased load.
  • Increased read throughput: In addition to lower latency, in-memory caching systems also offer much higher request rates compared to a disk-based database. A single cache instance can serve hundreds of thousands of requests per second.

Disadvantages of Caching

Caches are used in modern software to improve performance, but they can also have drawbacks.

  • If a cache becomes corrupted, it can cause problems with the app, such as incorrect data display, glitches, or crashes.
  • Caches can sometimes prevent apps from loading the most recent version of a web page or other data.
  • To minimize the negative effects of caching, it's important to manage cache misses effectively. Cache misses introduce latency that would not be present without caching. So it's important to keep them low compared to cache hits to maximize a cache's benefits. If cache misses are not well managed, caching system can become nothing more than overhead.


In summary, caching is a useful technique for improving the performance, reducing cost, and increasing the scalability of a system or application by storing frequently accessed data in a fast and easily accessible location. It can also benefit users by improving the performance and efficiency of the system or application.

Thanks to Chiranjeev and Navtosh for his contribution in creating the first version of this content. If you have any queries/doubts/feedback, please write us at Enjoy learning, Enjoy system design, Enjoy algorithms!

More From EnjoyAlgorithms

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.