The idea of write-through pattern is similar to read-through pattern but with one key difference: Here cache is also responsible for handling write operations. So, whenever an application wants to write some data, it will first write directly to the cache. At the same time, cache system will update data synchronously to the main database.
So cache is present between the application and database as an abstraction, and write operation always goes through the cache to main database. When both cache and database write are complete, write operation will be considered complete. This will ensure data consistency between cache and database.
Now, what about reading the data? As architecture is similar to read through pattern, we can easily use read-through strategy. Application will first look for data in the cache, and if it exists, cache will return it. If it does not exist, cache will fetch it from database, update its entry and return it to the application.
This combination adopts the fast read operation benefits of read-through strategy and data consistency benefits of write-through strategy. However, there is a significant disadvantage: This will introduce extra write latency because write must go to the cache and then to the database (two write operations). Is there a way to handle this latency issue? Can we use write-through pattern with cache-aside pattern? Explore and think!
Pros and cons of write-through caching
- Since all writes are immediately reflected in the database, read requests of the same data will always see the up-to-date version. This helps to avoid stale data in cache, and any subsequent read operations will be very fast. So, write-through strategy is the best choice when data freshness and performance of read operations are essential.
- As we have seen above, latency of write operation will be very high. If there are frequent writes, cache will have to access database frequently. On the other side, continuous write operations can evict data from cache which could be useful. Let's understand this! Every write is forced to go through cache. Due to this, cache may get occupied by frequently written data or data that is not frequently accessed. There can be little room left for other data that could benefit from caching. So write-through caching is not a good choice when write operations are frequent.
- If system uses a write-through cache for all data access (both reads and writes), a cache failure can cause severe issues. For example, if cache becomes unavailable, it can impact application performance because every read and write operation has to go through cache. So rather than using this strategy for all data access, application can selectively cache only certain types of data, such as frequently accessed or read-intensive data that doesn’t change frequently.
- Write-through caching reduces the risk of data loss in case of system failures like power outages or crashes. The idea is simple: When system recovers, it can pick up from where it left off because data in the cache and database were always kept synchronized.
Why there is a need for cache eviction strategy?
In one way, it looks like there is no need for a cache eviction strategy in a write-through cache because cache is always consistent with the database. But practically, this is not the scenario! We still need to define TTL or other eviction strategies (LRU or LFU). Why? There are several reasons:
- Even though cache is consistent with the database, it doesn't mean that every single piece of data should be kept in cache indefinitely. Storing all data indefinitely could lead to inefficient cache usage.
- As we have seen above, write-through can populate cache with unnecessary data. By adding a time-to-live (TTL) value to each write, we can avoid cluttering up cache with extra data and ensure that cache contains the most frequently accessed data.
- What will happen when some data in the database is updated by some external service? This will create inconsistency between cache and the database. So, we need to define a proper eviction strategy to ensure updated data in the cache.
Critical idea to explore!
- If write succeeds on the cache but fails on the database. How can you handle this inconsistency?
- How does the system handle cache failures or database failures?
- Is there a way to optimize performance of the write operation?
- If system uses distributed caching with multiple cache nodes, how is cache coherency maintained across these nodes to ensure consistency?
- How does the write-through cache handle cache synchronization when there are multiple instances of the application running concurrently?
- Write behind vs Write through cache.
- Code implementation and real-life systems that use write-through caching.
Please write in the message below if you want to share some feedback or if you want to share more insight. Enjoy learning, Enjoy system design!