Twitter is a social media platform where users may post and interact with “tweets.” Users submit and engage with “tweets” on Twitter, a microblogging and social networking site. Users can subscribe to other users’ feeds and receive tweet notifications from those they follow. Tweets are almost 140–280 character communications.
Almost everyone must be familiar with Twitter. It’s an incredible social networking platform. In this blog, we’ll be discussing how to design a Twitter-like system. So without any delay, let’s start by looking at the key requirements for our system!
Before jumping to actual designing a system like Twitter, it is necessary to list down the essential features and what are requirements we’re looking to satisfy using our design. In this blog, we’ll be covering these features:
Twitter roughly has 100 million active users and handles 500 million tweets per day on average. Twitter is more read-heavy than write-heavy, we should optimize for fast reads of tweets. It has 250 billion read requests per month and roughly 10 billion searches per month. Let's assume:
The system requires many application servers with load balancers in front of them for traffic distribution to serve client requests. Database servers that can store data effectively are required on the backend. The object file storage is also required to store images and videos in the case of a Twitter-like service. Our platform’s general high-level architecture is as follows:
The platform offers a variety of services, each of which serves a specific purpose. To cover all of the use cases in scope, each of these services provides a set of APIs that the client app or other services may use. Furthermore, each of these services has its specialized data store for storing its data. When a service receives a request, such as a new tweet or a request to follow a user, it validates the request and stores the data in its data store. The service also sends a message to the messaging service in the form of an event, which other services can consume and use to change their state.
Data model and Storage
Our platform’s major system components are as follows:
User Service: Provides an API for managing users and interactions, such as when one user follows or unfollows another. Instead of making the follow/unfollow use case part of the User Service, a separate User Follower Service may be created.
Tweet Service: Provides an API for creating and storing tweets in the data store. It also manages a user’s ability to leave comments on tweets and like them. If necessary, the Tweet Service can be further split down into Tweet Comments and Tweet Like Services.
Feeds Service: The feed service calculates the tweets that will appear on the user’s timeline. This service also has an API that returns tweets so that the timeline may be shown. Additionally, this API enables pagination, allowing users to obtain tweets in chunks as they browse through their timeline.
Timeline Generation: The timeline should show all of the followers’ most recent posts in a simple example. The timeline will be extremely slow to create for users with many followers since the algorithm will have to query, merge, and rank a large number of tweets. As a result, instead of producing the timeline when the user loads the page, the system should pre-generate it.
There should be dedicated servers that are constantly creating and storing users’ timelines in memory. The system may provide the pre-generated timeline from the cache whenever users load the app. Users’ timelines are not compiled on demand using this approach but rather regularly and returned to them whenever they want it.
Timeline Updates: If the system treats all users the same, a user’s timeline generation interval will be considerable, and he will experience a significant delay in receiving new postings. Prioritizing users with new updates is one approach to address this. New tweets are added to the message queue, then picked up by timeline generator services, which re-generates the timeline for all followers.
For publishing new posts to users, we can use these approaches:
In the next part, we’ll look at several optimizations approaches for scaling our architecture for performance, scalability, redundancy, and other non-functional needs.
Because there are so many users and tweets, it’s impossible to keep all of the data from a datastore on a single computer; thus, it’s preferable to start defining a partitioning strategy now. The partition strategy aids in the division of data for a particular service into several divisions. Each partition is replicated to a different cluster host. Data sharing is a method of spreading data based on predetermined criteria. Data sharding improves a data node’s speed by requiring it to search for a document within a smaller group of nodes. Let’s take a look at some of the different ways we may split our data.
Caching is a critical feature that will aid in the scaling of our system. It improves performance by returning precomputed data from the cache rather than computing data for each request.
The Feed Service is the appropriate component for introducing caching and improving system speed. To save the chronology for each user, we may use any caching solution such as Redis, Memcache, or others. Because the cached data on Twitter is primarily brief tweets, it may be feasible to store the whole user timeline or its majority. Finally, when the user browses the feed and adds additional tweets to the cache, we can preemptively request the next batch of feed data for that user from the store and cache it, decreasing latency and enhancing the user experience.
We may use DNS load balancing to evenly divide the load (requests) from diverse geographic regions across many data centers. Each service will have a load balancer in front of it to distribute incoming requests to various service nodes based on capacity estimates. Load balancing methods such as round-robin, least connection, and others can be employed. We can also add or remove nodes from the cluster without losing any state because our services are stateless.
In this blog, we discussed how to design a Twitter-like system. I hope you enjoyed reading this blog. Please share your views :)
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.