The concept of database replication has evolved in the last two decades due to the increasing use of distributed databases instead of databases with a single node. But its core principles are more and less the same over time. One of the reasons is: The fundamental constraints of distributed systems have not changed significantly.
In this blog, we will discuss the definition, types, advantages and disadvantages of database replication.
Database replication is the process of keeping multiple copies of the same data in different servers (replicas) so that if one server goes down, other servers can continue to serve data without any interruption or downtime.
Suppose there is an online store that has a single database server that stores all the data related to the products, orders, and customers. If this server fails due to some reason (hardware issues, software issues, etc.) the whole website will be unavailable.
To solve this problem, we can use database replication. The website owner can set up multiple database servers that replicate data from the primary database server. If the primary server goes down, other servers can take over and continue to serve requests.
It's important to ensure that any updates made to the data on one node are reflected on all other nodes. So, we need efficient techniques that can ensure that all nodes have the most up-to-date version of the data.
There are three types of database replication techniques based on architecture: Single-leader replication, multi-leader replication, and no-leader replication. Each strategy has its advantages and disadvantages.
This is also known as Master-Slave or Active-Passive replication.
In single leader replication, there is a single leader (master) and several follower replicas (slaves). All write requests are served by the leader node and all read requests are served by the leader or any of the follower nodes. The best idea would be to use the leader only for write requests and distribute read requests across followers. This will remove some load from the leader.
We have covered this idea of replication in a separate blog: Complete overview of Master-Slave Replication.
There is a big problem with leader-based replication: If the leader is unavailable due to some reason, we cannot perform write operations to the leader until we upgrade another follower replica as a leader. So a common solution to such a problem is to use multi-leader replication. This is also called master-master or active-active replication.
Here, clients can send write requests to one of the leader nodes, and each leader works as a follower for the other leader. So whenever a leader node performs the write operation, it will forward streams of data change to all other leader nodes.
There are some complexities in multi-leader architecture. For example, conflicts can arise when two or more leader nodes receive conflicting write requests simultaneously. So it’s important to have conflict resolution mechanisms in place to ensure data consistency. Note: In the near future, we will discuss multi-leader in detail in a separate blog.
This is also known as leader-less replication. In this architecture, clients send write requests to several nodes and read from several nodes in parallel. There is no concept of a leader in this approach, which allows any replica to directly accept writes from clients.
Leaderless replication can provide high availability and fault tolerance. Because there is no single point of failure, the system can continue to function even if some nodes fail. It can also provide high read and write throughput because requests can be distributed across multiple nodes.
This method also poses challenges for synchronization, as it can be difficult to ensure that all nodes have the same view of the data at all times. In addition, handling conflicts that may arise from concurrent writes can be complex, and careful design is necessary to ensure data consistency.
Note: In the near future, we will discuss multi-leader in detail in a separate blog.
In this strategy, the leader node responds immediately to the client after updating its own copy of the data, without waiting for the changes to be propagated to the followers. But there is a risk of data loss without the client’s knowledge because the confirmation comes before the main replication process. The idea is: replication happens in the background and the leader asynchronously propagates the changes to the followers. If the leader node crashes, changes in data that are not propagated are lost permanently.
Despite this disadvantage, asynchronous replication is the default strategy for most data stores because it offers flexibility. The idea is: Client is blocked only for the duration that the write happens on the leader!
In synchronous replication, once the leader node updates its own copy of the data, it initiates the write operation on its followers: Followers receive the update, apply changes to their copy of data, and then send confirmation to the leader. Once the leader receives confirmation from all followers, it responds to the client and completes the operation.
Synchronous replication ensures that followers are always in sync and consistent with the leader, making this setup fault-tolerant by default. Even if the leader crashes, the entire data is still available to the followers. So the system can easily promote any one of the followers as the new leader and continue to function as usual.
One major disadvantage of synchronous replication is that the client and the leader can remain blocked if a follower becomes non-responsive due to a crash or network partition. In other words, the leader will continue to block all writes until the affected followers become available again. So having a large number of followers in this setup can result in longer block times for the client.
There are many trade-offs to consider with replication. For example, when to use synchronous or asynchronous replication and how to handle failed followers. Explore the master-slave replication blog.
In this method, we copy the entire original database at every replica. This makes data highly available and decreases query execution time (data can be fetched from any closest replica). But it can be slow to update the replicas because the entire database needs to be copied at every replica’s location.
Full replication is useful when users at different locations need to see the same view of the data. For example, users looking for match scores need to see the same details about the match, regardless of their location.
In partial replication, we store a copy of only a selected part of data from the original database at each replica. So the type and importance of the data determine the number of replicas required. Here update process of each replica is fast because each replica only receives a portion of the entire database.
But the problem is: If the local replica does not contain some required data, it needs to be fetched from the original database. This can increase the query execution time. So, partial replication is useful when one wants to provide an isolated view of data based on their location.
For example, suppose an online fashion retailer sells clothing items in different regions of the world. The retailer may have different inventories in different regions depending on local demand. By replicating only the relevant data to each location, the retailer can ensure that customers in each region see only the relevant products that are available.
We will keep updating this blog with more insights on database replication. If you have any queries or feedback, please write us at email@example.com. Enjoy learning, Enjoy system design!
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.
©2023 Code Algorithms Pvt. Ltd.
All rights reserved.