Basic Concepts for System Design Interview Preparation

System design is one of the important topics that tech companies ask about during interview process. On the other hand, it is also necessary for solving large-scale software problems. After reading this blog, you will get familiar with the basic concept related to system design.

What is System Design?

System design deals with designing a system that meets functional and non-functional business requirements. While designing such systems, one needs to consider various trade-offs and think about various components, complexity and performance. So it would be best to have an excellent conceptual understanding of essential terms encountered in system design. We need to know about scalability (horizontal and vertical), caching, load balancing, data partitioning, various types of databases, network protocols, database sharding, and many more.

Similarly, we need to consider various tradeoffs while considering core functionalities of an application. As per our requirements, we need to decide upon: latency vs throughput, performance vs scalability, consistency vs availability, etc. Overall, system design is an open-ended discussion topic. That's why most top tech firms prefer to have one or two system design interview rounds.

So let’s dive in to get familiar with the essential topics or concepts used in system design.

Availability

Availability is one way of ensuring reliability of the system. It means: System should always remain up and return the response of a client request. In other words, whenever any user want to use the service, system must be available and satisfy the user query. 

Availability can be quantified by measuring the percentage of time the system remain operational in a given time window. We usually define the "availability percentages" of a system in terms of number of 9s (as shown in the table below).

nines of availability to measure availability percentages of a system

Availability of a system can be increased by redundancy. Replication is used to offer redundancy by deploying a copy of the same server so that if one server goes down, it won’t affect availability of the system.

Throughput

Throughput is the maximum rate of transfer or capacity of the system. It is used as a metric to determine how much work a system can do in a given time frame. One promising way to increase throughput of the system is by splitting up the requests and distributing them to various resources.

Latency

Latency is a measure of the time duration to produce the result. It is the time spent or lags in generating the desired output. In other words, latency helps us measure the performance of the system. Lower the latency, higher the system performance.

Network Protocols

Almost every system has an extreme dependency on networks. It acts as a platform for communication between users and servers or different servers. On another side, protocols are the rules governing how servers or machines communicate over the network. Some of the most common network protocols are HTTP, TCP/IP, etc.

network protocols layers

Load Balancing

Load balancers are machines that balance the load among various servers. To scale the system, we can add more and more servers, and hence there must be a way to direct requests to these servers in such a manner so that there is no heavy load on one server. In other words, load balancers can be used to to prevent a single point of failure.

  • Load balancers distribute the traffic, prevent the service from breakdown, and contribute to the system reliability. 
  • Load balancers act as traffic managers and help us to maintain system throughput and availability.

Proxies

Proxy is a middle-man that sits between the client and server. When a client sends a request, it passes through the proxy and then reaches the server. Proxies are of two types: Forward Proxy and Reverse Proxy.

The forward proxy acts as a mask for clients and hides the client’s identity from the server, e.g., VPN. Similarly, Reverse Proxy acts as a mask for servers and hides the server’s identity from the response, e.g., Load Balancer.

Databases

Every system is associated with some amount of data that needs to be stored somewhere. So whenever required, that can be fetched easily. Databases are broadly classified into Relational and Non-Relational Databases.

Relational Databases are those that strictly enforce a strong relationship among the data. These databases are highly structured in nature. Example: MySQL, PostgreSQL etc. On other side, Non relational databases have flexible structures, i.e., they are less rigid. Here data is stored in an unstructured way, and these databases are generally used in system that are highly distributed in nature and require high speed. Example: Cassandra, Redis, etc.

Database Partition is the way of dividing the database into smaller chunks to increase the performance of the system. It is another critical concept used to improve latency and throughput so that more and more requests can be served. Here is the list of some popular databases based on Stack Overflow Developer Survey 2021.

Popular databases in programming

ACID vs BASE

Relational and Non-Relational Databases ensure different types of compliance. Relational Databases are associated with ACID compliance, while Non-Relational associated with BASE compliance.

ACID: Atomicity, Consistency, Isolation, Durability

ACID compliance ensures relational nature of the databases and ensures transactions in a guided manner. The transaction is an interaction with the databases.

  • Atomicity comprises one or more operations and ensures that if any of these operations fail, then the entire transaction fails. It’s like “All or Nothing,” which is vital in the case of transactions.
  • Consistency ensures that each transaction must be valid according to a given set of rules. Whenever database states change, it won’t corrupt the data, and transactions can move from one valid state to another.
  • Isolation means that all the transactions are occurring independently without affecting other transactions. This ensures concurrency in database operations.
  • Durability ensures that whatever is written in the databases remains there, i.e., ensuring persistence of the databases.

BASE: Basically Available Soft State Eventual Consistency

BASE compliancy maintains the integrity of No-SQL databases and ensures the proper functioning of the NO-SQL databases, and is the main reason to build the No-SQL databases’ scalability.

  • Basically Available ensures that the system should remain available and guarantees its availability.
  • Soft state gives flexibility to the system and allows the system to change over time to provide faster access.
  • Eventual Consistency ensures that the system takes some time to reach a consistent state and eventually become consistent.

SQL vs NoSQL

While designing any application, one needs to be clear about the type of storage according to the system requirements. If the system is distributed in nature and scalability is essential, then No-SQL databases are the best choice to go with. No-SQL databases are also preferred when the amount of data is huge.

Simultaneously, SQL databases are favourable when the data structure is more important and generally preferred when queries are complex, and databases require fewer updates. However, there is always a trade-off when choosing between No-SQL vs SQL database. Sometimes, according to business, a Polyglot architecture comprising both SQL and No-SQL databases is used to ensure the application's performance.

Scalability in distributed systems

Whenever your services grow and more and more requests come to the system, then, in that case, your service will become slow and hence affect the performance. The best way to mitigate this is by scaling, i.e., increasing the potential of the system. There are two ways of doing so: Horizontal scaling and Vertical scaling.

Horizontal scaling means scaling the service by adding more servers to your applications to distribute the requests. Similarly, Vertical Scaling increases the same machine’s power or capacity by upgrading it to handle more and more traffic.

Caching

Caching ensures the performance of the system and helps in reducing the Latency of the System. To make our application faster, it is convenient to store some frequently used data to be accessed in less time to ensure the system’s speed. 

A cache is used to store these certain data pieces, so instead of querying the database, the data can be easily fetched from the cache. However, with the addition of cache, the system’s complexity increases. It is of utmost necessity to maintain synchronization between the data stored in the disk and the cache so that the system should remain consistent. 

Moreover, cache memory is too expensive. So one can’t have a cache beyond a specific limit. Hence various data eviction algorithms like LIFO, FIFO, LRU, LFU, etc., are used to ensure the service’s performance.

Consistent Hashing

Consistent Hashing is the most widely used concept in a distributed system as it offers considerable flexibility in the Scaling of the application. It is an improvement over normal Hashing. The traditional hashing method is ineffective in handling requests over a network. Here, inconsistent hashing, the user and servers are located virtually in a circular ring structure called the Hash ring. The ring is considered infinite and can accommodate any number of servers irrespective of no fixed allocation and assign them random locations based on some hash function. This concept allows the distribution of requests or data in the servers and their mapping to servers efficiently. It helps achieve Horizontal Scaling and increases the throughput and Latency of the application.

CAP Theorem

CAP Theorem is an essential concept for designing networked shared data systems. It states that a distributed database system can only provide two of these three properties : consistency, availability, and partition tolerance. We can make trade-offs between three available properties based on use cases for our dbms system.

  • Consistency: Consistency means that everything should go on in a very well-coordinated manner and with proper synchronisation. Consistency ensures that the system remains consistent and returns the results such that any read operation should give the most recent write operation.
  • Availability: Availability means that the system is always there and ready whenever any request is made to it. Whenever any client requests the server for the result, the system should remain available and give the response irrespective of the failure of one or more nodes. Replication is used to ensure redundancy, which directly contributes to the availability of the system.
  • Partition Tolerance: Partition Tolerance is necessary for any distributed system; we always need to choose between availability and Consistency. Partition Tolerance corresponds to the condition that the system should work irrespective of any harm or breakdown of nodes. Due to massive dependency on network calls, it is prevalent for a distributed system to fall into the trap of network failures; hence partition tolerance is essential. So, in that case, we have to choose judiciously between Consistency or availability as per our requirement.

Some other important concepts to explore

Conclusion

System Design is an essential skill to have and is equally important from the interview point of view at most top tech companies. One needs to be well aware of all the trade-offs while designing any system. In this blog, We tried to cover all the basic concepts necessary for getting started. We hope that after reading this blog, you’ll be now familiar with the basics of systems.

Enjoy learning, Enjoy system design!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.