The concept of database replication has evolved significantly in the last two decades due to the increasing use of distributed databases instead of databases with just a single node. In this blog, we will discuss the definition, types, advantages, disadvantages, and various tradeoffs related to database replication in distributed systems.
In database replication, master-slave replication is also known as active/passive or leader-based replication. There are two types of nodes in this architecture: Master and slave. The single master (leader) node works as the primary database, while one or more slave (follower) nodes maintain copies of the Master’s data.
Pastebin is a online content hosting service where users can store and share content in the form of text or images over the internet by generating a unique URL so that anyone can access the content via that URL. Users can also update the content if they are logged in. This blog will focus on pastebin system design and discussion around various components.
In system design, graph databases help us to efficiently traverse complex hierarchies, identify hidden connections, and uncover inter-relationships between elements. That’s why they are used in various applications, where things are interconnected.
In the common client-server architecture, multiple clients will communicate with a central server. A peer-to-peer (P2P) architecture consists of a decentralized network of peers, which are nodes that can act as both clients and servers. Without a central server's need, P2P networks distribute workload among peers, and all peers contribute and consume resources within the network.
Load balancing is essential for building high-performance and scalable applications. It is also a popular topic for system design interviews. In terms of definition: a load balancer is a software or hardware device that sits between clients and servers to balance the workload. It saves servers from overloading and increases system throughput.
As users type their search query, the Typeahead search feature guesses the rest of a word and offers the top suggestions that begin with whatever they have written. Here frequency and recentness of a query are used to sort suggestions. Search autocomplete system is used by many search platforms like Facebook, Google, Instagram, etc.
A key-value database is a non-relational database that stores data using a simple key-value mechanism. Data is stored in a key-value database to collect key-value pairs, with a key serving as a unique identifier. Both keys and values can be any object (simple or complex objects). This blog will focus on system design of key value store and discussion around various components.
Google docs is an online word processor that is part of Google’s free, web-based Google docs editors package. It is a massive system with tons of features. If we think about how google docs works, we can realize that it is much more complex than it seems to be. This blog will focus on system design of google docs and discussion around various components.
Whatsapp is a social messenger platform, which allows users to send messages to each other. It is a messaging system that is widely used throughout the globe. Here in this blog, we’ll be discussing WhatsApp’s generic architecture and which could also be used as a base for designing any such chat application. So let’s get started by discussing the key requirements of our service.
A web crawler is a system for downloading, storing, and analyzing web pages. It is one of the main components of search engines that compile collections of web pages, index them, and allow users to issue index queries and find web pages that match queries. This blog will focus on system design of web crawler and discussion around various components.
Twitter is a social media platform where users can post and interact with tweets. Users can also subscribe to feeds of other users and receive tweet notifications from those they follow. Tweets are almost 140–280 characters communication. This blog will focus on system design of Twitter and discussion around various components.
Instagram is a photo and video-sharing social media platform that allows users to share their creations with others. The original poster can set the visibility of these posts (photos/videos) to private or public. Posts can be liked and commented on by users. Users can follow and see the news feeds of other users (a collection of posts from the users they are following).
Caching is the process of storing the results of a request in a cache or a temporary storage location so they can be accessed more quickly. In system design, a cache is a high-speed data storage that stores a subset of data so that future requests for that data are served up faster. In other words, caching allows us to reuse previously retrieved data efficiently.
MapReduce is a batch processing programming paradigm that enables massive scalability across a large number of servers in a Hadoop cluster. It was first published in 2004, and was called “algorithm that makes Google so massively scalable.” This is a relatively low-level programming model compared to old parallel processing systems.
In computer networking, a network is a group of devices connected in some way to exchange data. Similarly, network protocols define a common format and set of rules for exchanging messages between devices. Some common networking protocols are Hypertext Transfer Protocol (HTTP), Transmission Control Protocol (TCP), and Internet Protocol (IP).
Load balancers distribute requests to servers based on various load balancing techniques. These techniques use different algorithms to select servers. There are two types of load balancing algorithms: 1) Static load balancing: round robin, URL hash, etc. 2) Dynamic load balancing: least connection method, least response time method, etc.
The least recently used (LRU) cache is one of the popular caching strategies, which defines the policy to discard the least recently used items first from the cache and make room for new elements when the cache is full. It is used to organize items in order of their use, which allows identifying items that have not been used for a long time.
The least frequently used (LFU) is a cache algorithm used to manage memory within a computer. In this method, the system keeps track of the number of times a block is referenced in memory, and when the cache is full, our system removes the item with the lowest reference frequency. LFU cache get and put operation works in O(1) average time complexity.
Data partitioning or sharding is a technique of dividing data into independent components. It is a way of splitting data into smaller pieces so that data can be efficiently accessed and managed. Database partitioning is the backbone of modern system design, which helps to improve scalability, manageability, and availability.
In this blog, we'll be discussing the key differences and advantages of forward and reverse proxies. A forward proxy is placed in front of a client to ensure that the client never communicates directly with the origin server. On the other hand, a reverse proxy is placed in front of an origin server to ensure that no client ever communicates directly with the server.
Web Socket is a communications protocol which provides full-duplex communication channels over a single TCP connection. This is also a vital component behind multiplayer games or applications that rely on real-time data transfer. In this blog, we have explained: 1) What is WebSocket in system design? 2) How does it work?
What is latency of a system? Latency is an essential system design concept that determines how fast the data transfers from the client to the server and back to the client. The lower system latency means better system performance. This blog will focus on how latency impacts the system's performance and measures to improve latency.
Polling is a technique that allows the servers to push information to a client. Long polling is a version of traditional polling that allows the server to send data to a client whenever available. It involves the Client requesting information from the server in the same way that standard polling does, but with the caveat that the server may not respond right away. A complete answer is delivered to the Client once the data is accessible.
Nowadays, almost every large-scale application is based on distributed computing. So understanding distributed system concepts is crucial for designing fault tolerant, highly scalable, and low latency services. This blog will introduce concepts of distributed systems, how they are applicable in real-world scenarios and their key characteristics.
A key value database is a non relational database (NoSQL) that stores data using a simple key-value mechanism. The structure of key value store is similar to maps or dictionaries, where each key is associated with one value only. The simplicity of this model makes key-value databases fast, easy to use, scalable, portable, and flexible.
A rate limiter restricts the number of events that a certain user or device can perform within a given time range. It helps us to limit the number of requests a sender can send in a particular period of time. Once the upper limit is reached, the rate limiter blocks further requests. In this blog, we will discuss components and algorithms related to design rate limiter.
In operating system, Process management involves executing various tasks such as creating processes, scheduling processes, managing deadlock, and termination of processes. It is the responsibility of the OS to manage all running processes of the system. In this blog, we will learn about process management in OS and its various related algorithms.
The client server architecture is a system consisting of clients and servers in which server hosts, manages, and delivers client’s services. Clients are connected to a central server and communicate through a computer network. There are four key components in a client server model: client, load balancer, servers, and network protocols.
YouTube is a frequently used social networking platform for video sharing and advertising. It is a widely used service, our system should be cost-efficient, highly available, scalable, and reliable. It should offer low latency and high throughput. This blog will focus on system design of youtube and discussion around various components.
Availability of a system is measured as the percentage of the system’s uptime in a given time period so that a system is available to perform tasks and functions under normal conditions. One way to look at system availability is how resistant a system is to failures. High availability systems have tradeoffs, such as higher latency or lower throughput.
The throughput of a system is defined as the total units of information a system can process in a given amount of time. It is generally represented as the number of bits transmitted per second or HTTP operations per day. This blog will discuss the important factors that affect throughput in designing distributed systems.
System design is an important topic that tech companies ask about during interviews. On the other hand, it is also important to solve large-scale software problems. This blog will go through fundamental system design concepts important for solving system design questions, learning system design at advanced level, and preparing for interviews.
Dropbox is a cloud storage service that allows users to store their data on remote servers. The remote servers store files durably and securely, which can be accessed from anywhere using the internet. These servers are maintained by cloud storage providers. In this blog, we will focus on system design of dropbox and discussion around various components.
Notification services are widely used in almost every product. They are helpful if we want to be alerted of a price change or availability of a new product feature or if we want to be updated if a new job specification for a job search becomes available. This blog will focus on system design of notification service and discussion around various components.
Design nearby friends service like yelp, where users can search for nearby places like restaurants, shopping malls, etc., and add/view reviews of places. Service should store location information so that it delivers a list of places near the user after a query is made. This blog will focus on system design of yelp and discussion around various components.
QR code payment is one of the contactless payment methods to transfer funds from the buyer’s wallet to the seller’s wallet or account. Here QR code is based on a matrix barcode, which can be read by any mobile device with a camera. This blog will focus on how to design QR code payemnt system and discussion around various components.
Uber continues to improve its operations by deploying and developing new services to meet market demand, finding the most efficient routes, detecting any potential fraud, and monitoring data to provide the most efficient real-time services. This blog will focus on uber system design and discussion around various components.
In distributed systems, a leader election algorithm is a process of designating a single process as the organizer of some task distributed among several computers (nodes). Such a process can have some specific abilities which include the capacity to delegate tasks, the ability to edit data, or even the responsibility for managing all system requests.
Have you ever been in a dilemma while choosing the most appropriate database for your application? What could be the most viable storage type that could meet the business expectations and offer efficient services? For selecting the database, we should have an understanding of the structure and the functionalities of each kind of database.
At its most basic level, a rate limiter restricts the number of events a certain object (person, device, IP, etc.) can do in a given time range. In general, a rate limiter caps how many requests a sender can issue in a specific time window. Rate Limiter then blocks requests once the cap is reached.
CAP Theorem is an essential concept in system design for designing networked shared data systems. It states that a distributed database system can only provide two of these three properties : consistency, availability, and partition tolerance. We can make trade-offs between three available properties based on use cases for our dbms system.
Databases are a critical component of the most complex systems, and how they are used has a significant impact on their performance, scalability, and consistency. Because this is an essential topic with many components, we have outlined the most crucial database topics that you’ll need to know during a system design interview.
Whenever we build any web application dealing with real-time data, we need to consider delivering data to the client. While building such a web application type, one needs to consider the best delivery mechanism. In this blog, we have focused on server sent events and discussed a complete insight into its internal working and the underlying features.
TinyURL is a URL-shortening system that creates shorter aliases for long URLs. Whenever a user visits the short URL, they will be redirected to the original URL. Our goal is to design a highly scalable service that could allow users to create shorter URLs from long URLs. This blog focus on TinyURL system design and discussions around various components.
Consistent Hashing is the most widely used concept in system design, as it offers considerable flexibility in scaling the application. This blog discusses the key concepts and approaches which come in handy while scaling out the distributed system. Consistent Hashing is frequently applied to solving various system-related challenges.
The system design interview is an open-ended conversation. So as a candidate, we need to follow well-defined steps to solve system design questions: 1) Requirements clarifications 2) Capacity estimation 3) Database design 4) Creating high-level design 5) Designing core components 6) Scaling the design 7) Resolving key bottlenecks.
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.