In cache-aside or read-through patterns, if cached data get evicted due to expiration of TTL (time to live), cache will reload data from the database when there is a next read request for the same data. This is a scenario of a cache miss, which will increase read latency.
Now the question is: Is there a way to prefetch data (from database to cache) before the expiration, so that request does not need to go to database? This is where Refresh-Ahead pattern comes into the picture.
How refresh-ahead caching works?
The goal of refresh-ahead pattern is to configure cache to asynchronously reload (refresh) the most recent version of data from database before the expiration (or close to its expiration). This refresh operation is done on a consistent time interval by some background job or a dedicated thread.
So refresh-ahead pattern ensures that cache entries that are likely to be accessed again soon are proactively available in the cache before they expire. Read requests for the same data will be served from cache with low latency. Now here is a dilemma: If there is a frequent refresh, we will get more up-to-date data but it will increase load on the database. On the other side, infrequent refresh can create a scenario of stale data in the cache. So we need to set appropriate refresh frequency to balance a tradeoff between data freshness and database load.
One idea is to define refresh-ahead factor for the cache entries. This is a value between 0 and 1, representing a percentage of the TTL value. For example, suppose TTL for cached data is set to 120 seconds and refresh refresh-ahead factor is set to 0.5.
- If application read data before 60th seconds, nothing will happen and cache will return the data.
- If application read cached data after the expiration time (after 120 seconds), cache will perform a synchronous read from the database and return data to the application. Synchronous read means that the cache will wait for reload operation to complete before serving requested data to the application. So this read operation will be slow.
- If application read cached data after 60th seconds but before 120th seconds, cache will return the current value and asynchronously fetch the updated version from database. Here data is reloaded ahead of the TTL limit, so it will also reset the TTL (extending its stay in cache). Note: Here refresh process is asynchronous. So cache will not wait for data to be reloaded before serving it to the application.
What will be the benefit of this approach? We mostly use cache to store frequently accessed data. If we totally rely on TTL value, then there can be a chance that we will be serving stale versions of frequently accessed data till the expiration of TTL. Even after the expiration, it will increase latency to fetch updated data from database. So using refresh-ahead pattern, frequently accessed cache entry will get reloaded asynchronously before its expiration. Due to this, subsequent reads of that entry will not face the latency of reloading from database.
So refresh-ahead caching is useful when a large number of frequently accessed data needs to be updated regularly (or expected to change frequently) and keeping it up-to-date is crucial for the application’s performance and functionality.
Critical ideas to explore!
- Is there some other way to implement this pattern?
- Rather than setting refresh factor for each cache entry, can we configure cache to asynchronously refresh a batch of frequently updated data on a scheduled interval using some background job?
- How can we monitor cache to improve the effectiveness of refresh ahead pattern? For example, suppose we observe the life time of cached data and based on some pattern or event, we configure cache to load the data even before the application requests it.
- How much complexity it will add to application code?
Advantages or benefits of refresh-ahead pattern
- Ensure frequently read data is updated before expiration.
- Reduces waiting time for data retrieval from the database. This will lower the read latency for frequently accessed data compared to other techniques like read-through pattern.
- Reduces sudden latency spikes caused by TTL expiration. In other words, to some extent, it will solve the “thundering herd” problem associated with read-through caching!
- Asynchronous refresh is triggered only when a cache entry is accessed and close to its expiration time. This will optimize refresh efforts for actively used data and avoid wasteful reloading of less frequently accessed data.
- Refresh-ahead is especially useful if data is accessed by a large number of users. Values remain fresh in the cache and the latency that could result from excessive reloads from the database is avoided.
Limitations or drawbacks of refresh-ahead pattern
- Implementing the refresh ahead pattern can add complexity to the application code, making it harder to maintain and debug. This is especially true if the caching logic becomes difficult to manage.
- It might add extra load on the cache and database if all keys refresh at the same time.
- The cache needs to accurately predict which cache items are likely to be needed in the future because inaccurate predictions can incur unnecessary database reads.
- Refresh-ahead might not be suitable for all access patterns. It may work well for certain read-heavy workloads, but the overhead of pre-fetching might not be justified for write-intensive access patterns.
Refresh-Ahead versus Read-Through Cache
- Refresh-ahead cache proactively fetches data from storage before it is explicitly requested. On the other side, read-through cache fetches data from storage and populates the cache only when it is explicitly requested by the application.
- Refresh-ahead cache predicts and fetches data likely to be accessed in the future. On the other side, read-through cache fetches data from storage into the cache on-demand when a read request is made.
- Refresh-ahead cache reduces access latency since data is already available in the cache when needed. On the other side, read-through cache may initially experience higher latency.
- Refresh-ahead may introduce overhead if the system spends significant resources on prefetching data, competing with other tasks. But read-through cache has generally lower overhead since it fetches data only in response to actual read requests.
- Refresh-ahead is particularly useful for scenarios with predictable access patterns. On the other side, there are no such limitations with the read-through cache. It can work best for scenarios where future access patterns are less predictable.
We hope you enjoyed the blog. If you have any queries or feedback, please write us at email@example.com. Enjoy learning, Enjoy algorithms, Enjoy system design!