Whether NoSQL or SQL databases are more appropriate for a given use case depends on a variety of factors, including the nature of the data, the application requirements, and the available resources and expertise. In this blog, we will discuss the key differences between SQL and NoSQL databases that software developers must consider when making a decision.
The concept of database replication has evolved significantly in the last two decades due to the increasing use of distributed databases instead of databases with just a single node. In this blog, we will discuss the definition, types, advantages, disadvantages, and various tradeoffs related to database replication in distributed systems.
In database replication, master-slave replication is also known as active/passive or leader-based replication. There are two types of nodes in this architecture: Master and slave. The single master (leader) node works as the primary database, while one or more slave (follower) nodes maintain copies of the Master’s data.
Hadoop is an open-source framework that addresses the analytical and operational needs of Big Data by overcoming the limitations of traditional data analysis methods. With support for highly scalable and fault-tolerant distributed file systems, it allows for parallel processing. It comprises four main components - HDFS, YARN, MapReduce, and Hadoop Common.
Based on Reports, more than 40% of data science jobs require SQL as an essential skill. So, to analyze datasets effectively in data science, one should master RDBMS, data cleaning processes, and SQL commands. The major advantage of using SQL is that SQL queries can be executed easily in Python by establishing connections to the database.
In system design, graph databases help us to efficiently traverse complex hierarchies, identify hidden connections, and uncover inter-relationships between elements. That’s why they are used in various applications, where things are interconnected.
In the common client-server architecture, multiple clients will communicate with a central server. A peer-to-peer (P2P) architecture consists of a decentralized network of peers, which are nodes that can act as both clients and servers. Without a central server's need, P2P networks distribute workload among peers, and all peers contribute and consume resources within the network.
Load balancing is essential for building high-performance and scalable applications. It is also a popular topic for system design interviews. In terms of definition: a load balancer is a software or hardware device that sits between clients and servers to balance the workload. It saves servers from overloading and increases system throughput.
Caching is the process of storing the results of a request in a cache or a temporary storage location so they can be accessed more quickly. In this blog, we have discussed: 1) How caching works? 2) Real-life examples of caching 3) Types of caching 4) Cache eviction policies 5) The idea of cache invalidation 6) Advantages and disadvantages of caching.
MapReduce is a batch processing programming paradigm that enables massive scalability across a large number of servers in a Hadoop cluster. It was first published in 2004, and was called “algorithm that makes Google so massively scalable.” This is a relatively low-level programming model compared to old parallel processing systems.
In computer networking, a network is a group of devices connected in some way to exchange data. Similarly, network protocols define a common format and set of rules for exchanging messages between devices. Some common networking protocols are Hypertext Transfer Protocol (HTTP), Transmission Control Protocol (TCP), and Internet Protocol (IP).
Load balancers distribute requests to servers based on various load balancing techniques. These techniques use different algorithms to select servers. There are two types of load balancing algorithms: 1) Static load balancing: round robin, URL hash, etc. 2) Dynamic load balancing: least connection method, least response time method, etc.
Bloom filter is a space-efficient data structure that tells whether an element may be in a set (either a false positive or true positive) or definitely not present in a set (True negative). It will take O(1) space, regardless of the number of items inserted. However, their accuracy decreases as more elements are added.
Data partitioning or sharding is a technique of dividing data into independent components. It is a way of splitting data into smaller pieces so that data can be efficiently accessed and managed. Database partitioning is the backbone of modern system design, which helps to improve scalability, manageability, and availability.
In this blog, we'll be discussing the key differences and advantages of forward and reverse proxies. A forward proxy is placed in front of a client to ensure that the client never communicates directly with the origin server. On the other hand, a reverse proxy is placed in front of an origin server to ensure that no client ever communicates directly with the server.
What is latency of a system? Latency is an essential system design concept that determines how fast the data transfers from the client to the server and back to the client. The lower system latency means better system performance. This blog will focus on how latency impacts the system's performance and measures to improve latency.
Polling is a technique that allows the servers to push information to a client. Long polling is a version of traditional polling that allows the server to send data to a client whenever available. It involves the Client requesting information from the server in the same way that standard polling does, but with the caveat that the server may not respond right away. A complete answer is delivered to the Client once the data is accessible.
Nowadays, almost every large-scale application is based on distributed computing. So understanding distributed system concepts is crucial for designing fault tolerant, highly scalable, and low latency services. This blog will introduce concepts of distributed systems, how they are applicable in real-world scenarios and their key characteristics.
A key value database is a non relational database (NoSQL) that stores data using a simple key-value mechanism. The structure of key value store is similar to maps or dictionaries, where each key is associated with one value only. The simplicity of this model makes key-value databases fast, easy to use, scalable, portable, and flexible.
In operating system, Process management involves executing various tasks such as creating processes, scheduling processes, managing deadlock, and termination of processes. It is the responsibility of the OS to manage all running processes of the system. In this blog, we will learn about process management in OS and its various related algorithms.
The client server architecture is a system consisting of clients and servers in which server hosts, manages, and delivers client’s services. Clients are connected to a central server and communicate through a computer network. There are four key components in a client server model: client, load balancer, servers, and network protocols.
Availability of a system is measured as the percentage of the system’s uptime in a given time period so that a system is available to perform tasks and functions under normal conditions. One way to look at system availability is how resistant a system is to failures. High availability systems have tradeoffs, such as higher latency or lower throughput.
The throughput of a system is defined as the total units of information a system can process in a given amount of time. It is generally represented as the number of bits transmitted per second or HTTP operations per day. This blog will discuss the important factors that affect throughput in designing distributed systems.
System design is an important topic that tech companies ask about during interviews. On the other hand, it is also important to solve large-scale software problems. This blog will go through fundamental system design concepts important for solving system design questions, learning system design at advanced level, and preparing for interviews.
The publish subscribe pattern, sometimes known as pub sub pattern, is an architectural design pattern that enables publishers and subscribers to communicate with one another. This pattern rely on a message broker to send messages from publisher to subscribers. Messages are sent out by the publisher to a channel that subscribers can join.
In distributed systems, a leader election algorithm is a process of designating a single process as the organizer of some task distributed among several computers (nodes). Such a process can have some specific abilities which include the capacity to delegate tasks, the ability to edit data, or even the responsibility for managing all system requests.
Have you ever been in a dilemma while choosing the most appropriate database for your application? What could be the most viable storage type that could meet the business expectations and offer efficient services? For selecting the database, we should have an understanding of the structure and the functionalities of each kind of database.
At its most basic level, a rate limiter restricts the number of events a certain object (person, device, IP, etc.) can do in a given time range. In general, a rate limiter caps how many requests a sender can issue in a specific time window. Rate Limiter then blocks requests once the cap is reached.
A storage device is a piece of hardware used for data storage. Storage is a mechanism that temporarily or permanently allows a computer to preserve data. A fundamental component of most digital devices is storage devices such as flash drives and hard drives. They allow users to store data such as videos, documents, photographs, etc.
How does 1-click-buy work on Amazon? How e-commerce platforms show the status of order? What happens when we cancel an order after placing an order or after item is shipped or delivered? How is all the activity related to an order tied to just one order Id? This blog will lay out key insights on designing a workflow for such distributed systems.
CAP Theorem is an essential concept in system design for designing networked shared data systems. It states that a distributed database system can only provide two of these three properties : consistency, availability, and partition tolerance. We can make trade-offs between three available properties based on use cases for our dbms system.
Whenever we build any web application dealing with real-time data, we need to consider delivering data to the client. While building such a web application type, one needs to consider the best delivery mechanism. In this blog, we have focused on server sent events and discussed a complete insight into its internal working and the underlying features.
Consistent Hashing is the most widely used concept in system design, as it offers considerable flexibility in scaling the application. This blog discusses the key concepts and approaches which come in handy while scaling out the distributed system. Consistent Hashing is frequently applied to solving various system-related challenges.
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.