Design Key-Value Store

What is a Key-Value Store?

A key-value database is a non-relational database that stores data using a simple key-value mechanism. Data is stored in a key-value database to collect key-value pairs, with a key serving as a unique identifier. Both keys and values can be any object, from simple to complex compound objects. Key-value databases are extremely partitionable and can scale horizontally to scales that other databases cannot. For example, Amazon DynamoDB allocates additional partitions to a table if an existing partition fills and more storage space is required.

Requirements of the System

Coming up with the features that the system should offer is the initial stage of any system design interview. As an interviewee, we should try to think of as many features as possible that our system should support.

  • Data size: Data size of values is too large to be held in memory, and we should leverage the external storage for them. However, we can still keep the data keys in memory.
  • Let’s assume a few 100 TB.
  • Should support updates.
  • Let’s assume that there is an upper cap of 1GB to the size of the value.

Functional Requirements:

  • Create a value for a key (Update the value if it already exists).
  • Get the value that the key specifies.
  • Remove the value from the key (Unset the key).

Non Functional Requirements:

  • Highly Scalable: It should be possible to increase the number of instances in the middle of a load rise.
  • Consistent: On each call, we should deliver a consistent and correct response.
  • Durable: During network partition failures, no data should be lost.
  • Availability: No single point of failure. According to the CAP theorem, it should be a CP, which means consistency takes precedence above availability.


This is usually the second part of a design interview, coming up with the estimated numbers of how scalable our system should be. Important parameters to remember for this section are the number of queries per second and the data which the system will be required to handle. Let’s assume:

Total Storage: 100 TB.
Total QPS: 100K.

How would we design a simple key-value storage system in a single machine?

The simplest straightforward solution is to store key-value pairs in a hash table, which is how most of these systems currently function. A hash table is a type of data structure that allows us to read and write a key-value pair in real-time and is very simple to use. This is something that most languages have built-in support for.

However, there is a disadvantage. We must normally store everything in memory when using a hash table, which is not always practicable when the data set is large. Two options are commonly used:

  • Compressing the data. This should be the first thing to consider, as there is frequently a lot of information that can be compressed. We can, for example, save references rather than actual data. Instead of float64, we can use float32. Furthermore, other data representations such as bit arrays (integer) or vectors can be useful.
  • Storing in a disk. We can save a portion of the data on the disk if it’s difficult to put everything in memory. We can think of the system as a caching system to further optimize it. The data that is often accessed is kept in memory, while the remainder is stored on a disk.

Distributed key-value storage

Scaling key-value storage over several machines is undoubtedly the most intriguing issue. If we want to support huge data, we’ll almost certainly need to implement a distributed system. Let’s look at how a distributed key-value storage system could be designed.

Because a single machine cannot store all of the data, the general idea is to divide the data among numerous machines according to some rules, with a coordinator machine directing clients to the machine that has the necessary resources. The challenge is to divide data among numerous machines and, more crucially, a decent data partitioning approach.


Let’s take a look at our previous estimate for the amount of data to be saved. A single system cannot store 100TB of data. Let’s suppose we have a machine that can hold that much data. That machine would have to handle all of the requests (all of the load), resulting in severe performance degradation. As a result, sharding the data and distributing the load across different machines is the optimal course of action.

Consider messaging as an example of an application created to store data for a user. Each user has their own inbox). As a result, if we shard based on each user as a row, it’s fine to store data in a denormalized manner, so we don’t have to query data across users. Let’s imagine we decide to store data in a denormalized manner in this scenario.

If the data is normalized, we must join across tables and rows to retrieve information. Any join across machines is particularly undesirable if the data has already been sharded among machines ( High latency, Less indexing support ). However, if we save denormalized data, we’ll be storing the same fields in multiple places. However, all information about a row ( or a key ) would be stored on the same machine. This would result in a reduction in latency. However, if the sharding criteria are not chosen properly, consistency issues may arise ( After all, we store the same data at multiple places ).

Reading the data should be consistent because there is only one copy. Latency should be acceptable as long as there are enough shards to ensure a reasonable load on each shard. Reads and writes would work the same as they do with a single DB, with the exception that there would be a row -> shard -> machine IP relationship. ( Given a row, tell me which shard it belongs to, and then given the shard, tell me which machine I should query/write to ) resolution layer in the middle.

There is only one tiny error in this model. What happens if the shard’s machine fails? Our shard will be down for maintenance ( which is fine as governed by the CAP theorem ). But what if the computer fails and the hard disc becomes corrupted? We are suddenly faced with the possibility of losing data, which is unacceptable. Consider losing all of our communications because our shard failed and the hard disc became corrupted. That means we’ll need to bring more than one copy of the data we’re writing.

System availability

One important measure to consider when evaluating a distributed system is system availability. What happens if one of our workstations breaks for some reason (hardware or software issues, for example), and how does this affect our key-value storage system?

We won’t be able to return the correct response if someone requests resources from this system, it appears. When developing a side project, we may overlook this difficulty. However, if we’re serving millions of people from a large number of servers, this happens frequently, and we can’t afford to restart the server manually every time. This is why, in today’s distributed systems, availability is critical. So, how would we go about dealing with this problem?

Of course, we can write more robust code with test cases. However, our program will always have bugs. In addition, hardware issues are even harder to protect. The most common solution is a replica. By setting machines with duplicate resources, we can significantly reduce system downtime. If a single machine has a 10% of chance to crash every month, then with a single backup machine, we reduce the probability to 1% when both are down.

Replica VS sharding

The replica appears to be very similar to sharding at first glance. So, what’s the connection between these two? When developing a distributed key-value store, how would we decide between replica and sharding?

First and foremost, we must understand the aim of these two strategies. Because a single machine can only store so much data, sharding is used to spread data over numerous machines. The replica serves as a backup for the system in the event of a failure. With that in mind, a replica won’t assist if a single machine can’t store all of the data.


We can strengthen the system by introducing replicas. Consistency, on the other hand, is a problem. Let’s imagine we have replica A2 for machine A1. How can we be certain that A1 and A2 have the same information? When adding a new entry, for example, we must update both machines. However, one of them may fail the write operation. As a result, A1 and A2 may accumulate a significant amount of conflicting data over time, which is a significant issue.

There are a few options available here. The first option is for the coordinator to keep a local copy. When a resource is updated, the coordinator keeps a copy of the new version. If the update fails, the coordinator can retry the procedure.

The commit log is another option. If we’ve been using Git, we’re probably already familiar with the concept of a commit log. Essentially, each node machine will record a commit log for each transaction, which acts as a track of all changes. So, if we wish to update an entry in machine A, we’ll have to go through the commit log first. Then, a separate software will process all of the commit logs (in a queue). We can recover if an operation fails since we can look at the commit log.

Redis Architecture

Redis is a key-value data store that runs in memory. The most widely used key-value data store is Redis. All of the world’s major IT companies use Redis. Redis is supported by Amazon Elastic Cache, making it an extremely powerful and must-know key-value data storage. In this post, I’ll give you a quick overview of Redis architecture.

What Is In-Memory, Key-Value Store

A key-value store is a type of storage system in which data is kept as key-value pairs. The key-value pairs are saved in primary memory when we say in-memory key-value store (RAM). As a result, we may say that Redis saved data in RAM as key-value pairs.
The value in Redis can be a string, list, set, sorted set, or hash, but the key must be a string. These are some instances of key-value pairs in Redis.

categories=["coding-interview", "machine-learning", "system-design"]

Advantages And Disadvantages of Redis over DBMS

Everything is stored in second storage in database management systems, which makes read and write operations extremely slow. However, Redis saves everything in primary memory, which is extremely fast in terms of data read and write.

Redis cannot store huge files or binary data since primary memory is limited (far smaller and more expensive than secondary memory). It can only hold little amounts of textual data that must be accessed, changed, and inserted quickly. We will get errors if we try to write more data than the available memory allows.

Enjoy learning, Enjoy algorithms, Enjoy system design!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.