There is one big challenge with write-through caching: the high latency of write operations because data needs to be updated in both places synchronously, i.e. first in the cache and then in the main database. One solution to improve write performance is to use write-behind (write-back) caching.
The idea of write-behind caching is similar to the write-through cache, but there is one significant difference: the data written to the cache is asynchronously updated in the main database. In other words, the application will first write to the cache, and the cache will write to the database after some delay.
Let's understand this from another perspective! In a write-through cache, when many write requests come in in a short period, the database can easily become a bottleneck. So, write-behind caching allows faster cache writes with eventual data consistency between cache and database. In the meantime, any read from the cache will still get the latest data.
Therefore, application does not need to wait for the database update, i.e. application only writes data to the cache, and cache acknowledges the write immediately. Cache keeps track of the changes made and writes them back to the main database at a later time (maybe during less busy periods). This asynchronous update will reduce the latency of the write operation.
Now the question is: How can we implement asynchronous update from cache to database? One idea is to use a time-based delay, where cache waits for a predefined period before performing the update to database. Another idea is to use an entry-based delay, where cache waits until a certain number of new data entries are accumulated before updating the database.
Advantages of write-behind pattern
- Write-behind cache improves the performance of write operations, so it is a good choice when dealing with write-heavy workloads. Additionally, in combination with read-through pattern, write-behind is also suitable for mixed workloads involving both read and write operations. Explore and think!
- Cache acknowledges writes immediately and postpones database updates. So this will reduce strain on the cache. Even if database experiences downtime or failures, application can still function and serve read and write requests from the cache. The updates that have not yet been written to the database are queued in cache and can be synchronized with the database once it recovers. This enhances the overall availability of the system and mitigates the impact of database failures.
- Write-Back caching also reduces the workload on the database, as the database is updated less frequently. Now the question is: Can we further reduce the load on database?
Approaches to offload database
The idea is to use these strategies in combination with the write-behind approach to optimize the writing process and reduce workload on the database by handling write spikes more gracefully.
Using rate limiting
When a lot of write requests come in all at once, it can overwhelm the database. To avoid this, we can use rate limiting in a write-behind cache to put a cap on the number of write requests the database can handle per second or per minute.
This will spread out the write workload over a more extended period, ensuring a steady flow of writes to the database rather than a sudden surge during peak periods. It will give the database enough time to catch up and process the write requests without getting overloaded.
Using batching and coalescing technique
We can use batching and coalescing techniques in the write-behind cache to reduce the number of write requests. Batching combines multiple write operations into a single write while coalescing consolidates multiple updates on the same data into a single update.
This means that instead of immediately writing each change to the database, the cache can group them and write them as a single operation. Consequently, this will decrease the overall number of database writes, effectively reducing the load on the database. The good thing is: It will also save costs when the database provider charges based on the number of requests made.
Using time shifting
Databases can experience "rush hours" when lots of data are being written or modified simultaneously. So time shifting is an idea to strategically move the process of writing data to less busy times or time intervals other than peak usage times. This will allow the system to avoid becoming overwhelmed during high contention periods.
Disadvantages and limitations of write-behind pattern
- Delay between cache write and database write introduces a period during which data in the cache is not yet reflected in the database. During this period, cache store the most recent data and the database may store old data i.e. cache may be ahead of the database in terms of data consistency. So we should use write-behind cache when application is designed to tolerate inconsistent data during delays.
- If there is a cache failure or system crash, recently written data in the cache may be permanently lost. This can affect overall functioning of the application.
- In write-behind cache, if a failure occurs during delayed write to the database after the cache write has been acknowledged to the application, there will be a chance of data loss or data inconsistency. How can handle this situation? One idea is to implement retry mechanisms or use delayed writes with an appropriate timeout.
Write-behind vs Write-through cache
The decision between write-through and write-behind caching strategies is related to one of the key trade-offs: data consistency vs write performance. Here is a critical comparison between both approaches:
- Write-through maintains consistency with cache and database. On the other side, write-behind can lead to temporary inconsistency with database and cache.
- Since data is written immediately to both cache and database, write-through caching provides strong data durability. In the event of a system failure, the updated data will be available in the database. On the other side, there is a potential risk of data loss in write-behind caching, especially in the event of a sudden power failure or cache failure before the data is updated in the database.
- Write-through introduces high latency for write operations because data needs to be written to both the cache and the database. On the other side, write-behind caching usually provides low latency for write operations compared to write-through because data is first written to the cache and application can proceed without immediate write delays.
- Write-through is well-suited for highly sensitive data or scenarios where data consistency and integrity are top priorities. On the other side, write-behind caching is suitable for non-critical data or scenarios where write performance for write-heavy workloads is crucial.
Critical ideas to think and explore!
- How does the write-behind caching approach handle conflicts when multiple write requests are received for the same data within the delay period?
- Is there a need to implement TTL or other cache eviction policies in the write-behind caching?
- What are some specific use cases where write-behind caching is not recommended?
- Examples of caching technologies that are well-suited for implementing write-behind pattern.
- Are there any best practices or guidelines for determining the appropriate delay time for the asynchronous updates from the cache to the database?
- How does the write-behind caching approach impact data integrity and consistency in scenarios where multiple caches are distributed across different nodes or regions?
- Can we combine write-behind caching with other caching strategies like cache-aside or read-through caching, to achieve even better performance optimizations?
We hope you enjoyed the blog. If you have any queries or feedback, please write us at firstname.lastname@example.org. Enjoy learning, Enjoy algorithms, Enjoy system design!