Data Partitioning - System Design Concept

Introduction

With the growth in services and user base, it becomes quite difficult for a single server or database to keep functioning efficiently. Data Partitioning comes to rescue in that case by improving scalability, by offering better data and request management and performance optimisation. In this blog, we’ll be giving a conceptual understanding of Data Partitioning and how to perform Data Partitioning efficiently. Hence, without any further delay, let’s get started and look at what is Data Partitioning.

What is data partitioning?

As the name suggests, Data Partitioning is a partitioning technique of dividing the data into independent components. It is a way of partitioning data into smaller pieces so that data can be easily accessed and becomes manageable at a much finer level. Database partitioning is the backbone of modern distributed database management systems. Data Partitioning is very useful in improving the scalability and performance of the system. It enhances the manageability and availability of the service and effectively reduces the cost of storing a large amount of data. There are a large number of criteria available for data partitioning. Most of them use partition keys and assign partition on its basis. Some of the data partitioning criteria are range-partitioning, list-partitioning, round-robin partitioning, hash partitioning, and many more.

idea of data partitioning in system design

Why do we need data partitioning?

Data Partitioning brought the key breakthrough towards improving the scalability and performance of the distributed system. It helps to manage and effectively use resources. Data Partitioning helps to:

Improve Availability: Data Partitioning increases the availability of the service by avoiding a single point of failure. It partitions the data across multiple servers and hence is very useful in improving availability. Data Partitioning also provides in-built redundancy, so it won’t make the service unavailable in case of failure.

Increase Scalability: Every hardware comes up with certain capacity limitations. With the increase in traffic, the performance of the services decreases. Data Partitioning proves to be effective in increasing scalability by distributing the data across multiple partitions. It gives the wings to the service to scale out indefinitely without any further limitations.

Improve Security: Data Partitioning also helps improve the system’s security by storing the sensitive and non-sensitive data into different positions and hence helps to offer better manageability and desirable security to the sensitive data.

Increase Performance: Data Partitioning improves the performance of the system. Instead of querying the whole database, now the system has to query only a smaller component, contributing to increased service performance. Data Partitioning is a crucial component in making our service efficient.

Data Partition Strategies

Data Partitioning can be done through various strategies. It is possible to distribute the database into separate and smaller databases or split some aspects of one table. Broadly there are three different data partitioning strategies used. Let’s have a look at each one of them.

  • Horizontal Partitioning: It is also referred to as sharding. In this partitioning strategy, the database is partitioned into separate and independent smaller data stores consisting of the same schema. Each partition holds a certain specific amount of data. These partitions are also called shards. In this strategy, the sharding key selection is essential as the key is responsible for spreading the workload among all the shards. It is essential to balance the number of requests between shards; otherwise, some might be overloaded while some might remain idle. This uneven distribution might affect the performance of the service and becomes more prone to failure.
  • Vertical Partitioning. It involves creating relatively smaller tables with fewer elements and simultaneously using additional tables for remaining data storage. In this partitioning strategy, a subset of files of data is stored in each partition. Vertical partitioning operates at the entity level and is also referred to as Normalisation. A columnar database can be regarded as a vertically partitioned database. Vertical partitioning helps to separate sensitive and non-sensitive data and can reduce the amount of concurrent access.
  • Functional Partitioning. In this type of partitioning strategy, data is aggregated based on the contextual dependency of the service. A medical store system might store the medicines information in one partition and invoice data in another partition.

We can go ahead with any specific type of data partitioning based on the structure of the data. However, in some cases, we can combine both horizontal and vertical partitioning to get the best of it. Consider a situation in which we have a large dataset of customers with different data types. In that case, we can vertically divide the database into string values and horizontally partition the customer information.

Effective Data Partitioning Designs

Query Processing: Effective Data Partitioning strategy improves the query performance by using relatively smaller data sets and by the inclusion of parallelism. Data partitioning improves the database’s manageability and allows easy backup and recovery of smaller components than the complete database.

Application Consideration: Data Partitioning highly dependable on application requirements. It improves the system’s availability, scalability, and performance and adds complexity to the service’s design and development. Considering the application’s requirements, it is essential to figure out how the data will be accessed, how it is queried, modified, and which design would offer the best performance with minimal latency and in a resilient manner.

Rebalancing partitions: With the increase in traffic to the system, it could be possible that the service might start getting a disproportionate amount of traffic, which leads to excessive contention. Hence, partitioning needs to be rebalanced by defining a new partitioning strategy and migrating data from old to new partitioning schemes.

Conclusion

Data Partitioning is the backbone of modern distributed data management systems. Data Partitioning proves very effective in improving the availability, scalability, and performance of the system. In this blog, we tried to present a full conceptual understanding of Data Partitioning. Hope you liked it. Please share your views in the comments below.

References

  1. https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
  2. Designing Data Intensive Applications

We welcome your comments

Subscribe Our Newsletter

Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.