Twitter is a social media platform where users may post and interact with “tweets.” Users submit and engage with “tweets” on Twitter, a microblogging and social networking site. Users can subscribe to other users’ feeds and receive tweet notifications from those they follow. Tweets are almost 140–280 character communications.
Almost everyone must be familiar with Twitter. It’s an incredible social networking platform. In this blog, we’ll be discussing how to design a Twitter-like system. So without any delay, let’s start by looking at the key requirements for our system!
Before jumping to actual designing a system like Twitter, it is necessary to list down the essential features and what are requirements we’re looking to satisfy using our design. In this blog, we’ll be covering these features:
- User able to post a tweet
- Service able to push tweets to followers, send push notifications
- User able to view the user timeline and the home timeline
- User able to search the keywords
- Service should be highly available
Twitter roughly has 100 million active users and handles 500 million tweets per day on average. Twitter is more read-heavy than write-heavy, we should optimize for fast reads of tweets. It has 250 billion read requests per month and roughly 10 billion searches per month. Let's assume:
- Size per tweet = 10 KB
- Size of tweet per month = 10 KB * 500 million * 30 = 0.15 PB(storage)
- Read requests per second = 100k (250B*400/1B)
- Tweets per second = 6,000 (15B*400/1B)
The system requires many application servers with load balancers in front of them for traffic distribution to serve client requests. Database servers that can store data effectively are required on the backend. The object file storage is also required to store images and videos in the case of a Twitter-like service. Our platform’s general high-level architecture is as follows:
The platform offers a variety of services, each of which serves a specific purpose. To cover all of the use cases in scope, each of these services provides a set of APIs that the client app or other services may use. Furthermore, each of these services has its specialized data store for storing its data. When a service receives a request, such as a new tweet or a request to follow a user, it validates the request and stores the data in its data store. The service also sends a message to the messaging service in the form of an event, which other services can consume and use to change their state.
Data model and Storage
- Both SQL and NoSQL options might be considered for the data storage. However, due to the scalability issues with SQL solutions and the vast number of users to serve, we will go for a NoSQL solution for the data storage. We can utilize a broad column data store like Cassandra for other things like tweets, comments, and likes.
- We may utilize a graph-based data store solution like NeoJ or Cassandra to store entities like the user and their followers.
Detailed Components Design
Our platform’s major system components are as follows:
User Service: Provides an API for managing users and interactions, such as when one user follows or unfollows another. Instead of making the follow/unfollow use case part of the User Service, a separate User Follower Service may be created.
Tweet Service: Provides an API for creating and storing tweets in the data store. It also manages a user’s ability to leave comments on tweets and like them. If necessary, the Tweet Service can be further split down into Tweet Comments and Tweet Like Services.
Feeds Service: The feed service calculates the tweets that will appear on the user’s timeline. This service also has an API that returns tweets so that the timeline may be shown. Additionally, this API enables pagination, allowing users to obtain tweets in chunks as they browse through their timeline.
Timeline Generation: The timeline should show all of the followers’ most recent posts in a simple example. The timeline will be extremely slow to create for users with many followers since the algorithm will have to query, merge, and rank a large number of tweets. As a result, instead of producing the timeline when the user loads the page, the system should pre-generate it.
There should be dedicated servers that are constantly creating and storing users’ timelines in memory. The system may provide the pre-generated timeline from the cache whenever users load the app. Users’ timelines are not compiled on demand using this approach but rather regularly and returned to them whenever they want it.
Timeline Updates: If the system treats all users the same, a user’s timeline generation interval will be considerable, and he will experience a significant delay in receiving new postings. Prioritizing users with new updates is one approach to address this. New tweets are added to the message queue, then picked up by timeline generator services, which re-generates the timeline for all followers.
For publishing new posts to users, we can use these approaches:
- Pull model: Clients can get data regularly or whenever they need it manually. The difficulty with this method is that new information is not displayed until the client submits a pull request. Furthermore, most pull requests will return an empty response since no new data has been added, wasting resources.
- Push model: When a user posts a tweet, the system may send a notification to all of their followers right away. One potential drawback of this technique is that when a person has millions of followers, the server must simultaneously transmit changes to a large number of individuals.
- Hybrid: Both the pull and push models can be combined. Only those with a few hundred (or thousands) followers get info from the system. For celebrities, we may hand over the updates to their fans.
In the next part, we’ll look at several optimizations approaches for scaling our architecture for performance, scalability, redundancy, and other non-functional needs.
Because there are so many users and tweets, it’s impossible to keep all of the data from a datastore on a single computer; thus, it’s preferable to start defining a partitioning strategy now. The partition strategy aids in the division of data for a particular service into several divisions. Each partition is replicated to a different cluster host. Data sharing is a method of spreading data based on predetermined criteria. Data sharding improves a data node’s speed by requiring it to search for a document within a smaller group of nodes. Let’s take a look at some of the different ways we may split our data.
- Sharding based on UserID: We link each user to a server based on UserID, which keeps all of the user’s tweets, likes, and followers, among other things. When users (such as celebrities) are popular, this strategy fails, and we end up with more data and access on a subset of servers than on others.
- Sharding based on TweetID: We map each tweet to a server that holds the tweet information using TweetID. We must query all servers to find tweets, and each server will return a collection of tweets. This technique overcomes the problem of hot users, but it increases latency because all servers must be queried.
Caching is a critical feature that will aid in the scaling of our system. It improves performance by returning precomputed data from the cache rather than computing data for each request.
The Feed Service is the appropriate component for introducing caching and improving system speed. To save the chronology for each user, we may use any caching solution such as Redis, Memcache, or others. Because the cached data on Twitter is primarily brief tweets, it may be feasible to store the whole user timeline or its majority. Finally, when the user browses the feed and adds additional tweets to the cache, we can preemptively request the next batch of feed data for that user from the store and cache it, decreasing latency and enhancing the user experience.
We may use DNS load balancing to evenly divide the load (requests) from diverse geographic regions across many data centers. Each service will have a load balancer in front of it to distribute incoming requests to various service nodes based on capacity estimates. Load balancing methods such as round-robin, least connection, and others can be employed. We can also add or remove nodes from the cluster without losing any state because our services are stateless.
In this blog, we discussed how to design a Twitter-like system. I hope you enjoyed reading this blog. Please share your views :)