In write-through or write-behind cache, if data is unlikely to be read again soon after the write operation, cache may become filled with rarely used data (cache pollution). So, in the case of such data access pattern, we can use write-around strategy for efficient cache usage.
In write-around caching, when application needs to write new data, it bypasses the cache and directly writes it to the underlying data source. Consequently, data is not stored in the cache during the write operation. When a read request is made to the cache, and if there is a cache miss, cache will fetch data from the database, update its corresponding entry, and then return data to the application.
Now let's consider a critical scenario: Suppose we want to update data that is already present in the cache. Following write-around idea, we would update the old data in the database. However, a key issue arises when a user attempts to read the same data again. The application would return the older data from cache rather than the newly stored data from the database. How can we handle this scenario? Let's think!
We can think about three possible options:
- Tolerate arbitrarily stale data in cache (usually not practical).
- One brute force idea is: For every read, cache will check and confirm with the database whether data is stale or not. However, this approach will increase read latency by a huge margin, which is practically not good for a read-heavy workload! So in this scenario, it is better to use write-through caching to avoid another query to the database.
- A more viable solution is to update the database and invalidate corresponding cache entry. By doing so, we force a cache miss on the next read request. This is one of the variation of write-through cache, which is called write-invalidate policy.
Normally, it doesn't make much sense to implement a pure write-around cache because it is time-consuming and slow. But if we use the write-around pattern with cache-aside or read-through patterns, it will make some sense. Here is the reason: Suppose we have some data that will not be read anytime soon. In this situation, we never want write of such data to go through cache.
Since cache generally has a limited size, by not caching unimportant data, we leave more room for the data that actually matters for performance i.e. frequently accessed data or read heavy data. In other words, we never want rarely read data to fill the cache because it can evict parts of the active data that you actually want to be cached. Is there some other way to handle the above scenario of stale data in the write-around cache? Explore and think!
Advantages or benefits of write-around caching
- As we have seen above, write-around caching helps to improve cache efficiency by preventing the cache from occupying infrequently accessed data, write-only data that may never be reused, or data that is unlikely to be read again soon after the write operation.
- With cache-aside or read-through patterns, write-behind caching works best for read-heavy workloads. This approach helps us implement lazy loading (storing data in the cache based on demand).
- By bypassing the cache and writing data directly to the database, write-around pattern will keep the database consistently up to date. Even in the case of cache failure or power failure, it reduces the risk of data loss since the data is already safely stored in the database.
- By bypassing the cache, write-around caching helps prevent unnecessary cache evictions and reduces cache thrashing. Note: Cache thrashing occurs when the cache is constantly evicting and loading data due to its limited size.
Limitations or drawbacks of write-around caching
- Read operations for recently written data will experience a cache miss (higher latency) since data must be fetched from the database.
- If data is constantly updated in the database, and corresponding cache entry is not invalidated, then subsequent read operations will access the old or stale version of the data. This will create cache coherence issues and lead to inconsistent results. So, we need to implement a proper cache invalidation scheme to ensure consistency between cache and database.
- Even if we invalidate cache after the database write, it will increase the chances of a cache miss if data is updated constantly in the database. So, this is not a good choice for a write-heavy workload.
- It does not benefit from the cache coalescing or buffering effects, which are techniques of combining or delaying multiple write requests to reduce the load on the database.
- Determining which data to cache and which to bypass requires an understanding of the data access patterns. Depending on the accuracy of these predictions, write-around caching may or may not be an optimal choice.
Critical ideas to think and explore further!
- How does write-around caching ensure consistency between the cache and the underlying data source?
- In what scenarios would hybrid caching, combining write-around caching with other caching techniques, be a more effective solution?
- How do data size and access patterns affect the performance of the write-around cache?
- What are the best practices for monitoring performance and optimizing write-around cache usage?
- What are the other limitations of write-around caching?
- How can cache invalidation schemes be implemented effectively to maintain consistency between the write-around cache and the database?
- What are the trade-offs between write-around caching and other caching strategies in terms of data durability and performance under different workloads?
We hope you have enjoyed the blog. If you have any queries or feedback, please write us at firstname.lastname@example.org. Enjoy learning, enjoy algorithms, enjoy system design!