Design Instagram

What is Instagram?

Instagram is a photo and video-sharing social media platform that allows users to share their creations with others. The original poster can set the visibility of these posts (photos/videos) to private or public. Posts can be liked and commented on by users. Users can follow and see the news feeds of other users (a collection of posts from the users they are following).

Users can also search for content across the entire platform. Image editing, location tagging, private messaging, push alerts, group messaging, hashtags, filters, and more are all available on Instagram.

Requirements of the system

Functional Requirements

  • Photos should be able to be uploaded and viewed by users.
  • Users can search for photos based on their titles.
  • Other users can be followed by a user.
  • Create a custom NewsFeed for each user that includes the best photos from all of the individuals and accounts the user follows.

Non Functional Requirements

  • Read heavy :  read to write ratio is very high.
  • Low latency is expected while viewing photos.
  • Access pattern for posts : optimize so that media content is easily accessible when the post gets the most interaction.
  • Globally available : works on a wide range of devices, supports many languages, and works with a wide range of internet bandwidth.
  • Out system should be highly Scalable and Reliable.

Capacity Estimation

The crucial point to keep in mind is that the number of reading requests will be 100 times higher than the number of uploads (writing) requests. Assume we have 500 million users registered on our platform, with 1 million of them active per day. If 5 million images are posted every day, the number of photos uploaded in one second is:

1 sec = 5M / (24*60*60) ≈ 57 photos

If the average photo size is 150 KB, the following is the daily storage usage:

5M * 150KB = 716 GB

If we assume our service would continue for ten years, the space required will be:

716GB * 365 * 10 ≈ 2553TB ≈ 2.6PB

High-Level Design


Designing Instagram high level design

A user service manages user onboarding, login, and profile-related actions. The user service runs on a MySQL database, which is chosen since the data is structured in a relatively relational manner. Also, user data will be read-heavy rather than write-heavy, and MySQL will suffice for such a query pattern. The user service is also linked to a Redis database, which stores all of the user’s data. When the user service receives a request, the first thing it does is look it up in Redis. The user service checks in the MySQL DB inserts the information into Redis for future usage and then returns to the user if Redis contains the information. Also, whenever a new user or information is introduced.

System Components

The system will be made up of multiple microservices, each of which will execute a different task. The data will be stored in a graph database such as Neo4j. Because our data will contain complex relationships between data elements such as users, posts, and comments as nodes of the graph, we’ve chosen a graph data model. After that, we’ll use the graph’s edges to record relationships like follows, likes, and comments, among other things. In addition, columnar databases such as Cassandra can be used to store information such as user feeds, activities, and counters.

Components of Instagram system design

Overall Data Flow and API Design

Data Flow

  1. An API request is sent by the user.
  2. The request is received by the load balancer, which then sends it to an app server.
  3. That request is received by an app server.
  4. After input validation and sanitization, the app server tries to fulfill the request.
  5. If everything went well, the app server delivers an ok response with or without required data; otherwise, it sends a specified error response.

API Design

  • signup (username, firstname, lastname saltedpasswordhash, phone_number, email, bio, photo)

    • adds the user to the user table
  • login (username, saltedpasswordhash)

    • log in and update the last login time
  • search_user (searchstring, authtoken)

    • return public user data for a given search string (can be searched in user first name, last name, and username)
  • getuserby_id(userid, auth_token)

    • return public user data for given user-id
  • follow_user(userid, targetuserid, authtoken)

    • Add follow data in DB
  • add_post(file, caption, userid, authtoken)

    • upload file to file storage server
  • delete_post(userid, postid, auth_token)

    • delete given user’s given post along with its metadata(use soft delete).
  • get_feed(userid, count, offset, timestamp, authtoken)

    • return top posts after the given timestamp of users followed by the given user according to count and offset.
  • getuserposts(userid, count, offset, authtoken)

    • return posts of the given user according to count and offset
  • post_like(userid, postid, auth_token)

    • add given post id to given user’s likes
  • post_unlike(userid, postid, auth_token)

    • remove given post id from given user’s likes
  • add_comment(userid, postid, comment)

    • add a comment to give a user’s comment on a given post
  • delete_comment(userid, commentid)

    • delete given user’s comment of given comment id

Database Design

Early in the interview, define the database structure to aid in understanding the data flow between various components and, eventually, data segmentation.

Data about users, their posted images, and the people they follow must be stored. We require an index on (PhotoID, CreationDate) since we need to obtain recent photos first from the photo table, which will store all data connected to a photo.

Database design of Instagram system

Because we need joins, a simple option for storing the aforementioned structure would be to utilize an RDBMS like MySQL. However, relational databases have their own set of issues, particularly when it comes to scaling. Photos can be stored in a distributed file system such as HDFS 5 or S3 10.

To make use of NoSQL’s features, we can store the aforementioned schema in a distributed key-value store. All photo metadata can be stored in a table with a ‘key’ of ‘PhotoID’ and a ‘value’ of an object including PhotoLocation, UserLocation, CreationTimestamp, and so on.

To know who owns which photo, we need to store relationships between users and photos. We also need to keep track of who a user follows. We can use a wide-column datastore like Cassandra 28 for both of these tables. The ‘key’ for the ‘UserPhoto’ table would be ‘UserID,’ and the ‘value’ would be the user’s list of ‘PhotoIDs,’ kept in distinct columns. The ‘UserFollow’ table will follow a similar pattern.

Cassandra, like all key-value stores, has a set number of replicas on hand to ensure reliability. Deletes are also not implemented immediately in such data stores; data is held for a specified number of days (to allow for undeleting) before being erased from the system.

News Feed Generation

Generating News Feed

Designing a customized newsfeed for each user, featuring the most recent post from each user he or she is following, is one of the most important needs of an Instagram-like service. For the sake of simplicity, imagine that each user and their followers upload 200 new unique photos per day. As a result, a user’s newsfeed will consist of a combination of these 200 unique photographs, followed by the reputation of previous submissions.

So, in order to generate a news feed for a user, we will first acquire the metadata (likes, comments, time, location, and so on) of the most recent 200 photographs and give it to the ranking algorithm, which will determine how the photos should be placed in the newsfeed based on the metadata.

The major disadvantage of the above newsfeed generation approach is that it necessitates simultaneously querying a large number of tables and then ranking them based on predefined criteria. As a result, this approach will result in higher latency, i.e. it will take a long time to generate a newsfeed.

News feed generation of Instagram system

Pregenerating News Feed :  To avoid the problems with the above news feed producing algorithm, we’ll set up a server that will generate a unique newsfeed for each user ahead of time and store it in a separate newsfeed table. With this method, we’ll simply query this table whenever the user wants to access the most recent newsfeed.

Serving the News Feed

We have now seen how to create a news feed. The next big challenge in Instagram architecture design is determining how the user will get the generated newsfeed.

Push :  One method is to alert all of a user’s followers whenever he or she uploads a new photo. We can do this by using Long-Pooling.

A potential issue with this strategy is that if a user follows a large number of persons or celebrities, the server will have to push updates/ deliver notifications quite frequently.

Pull : When users want to see new content, they will refresh their newsfeeds (send a pull request to the server). The difficulty with this strategy is that the new post will not appear until users do not refresh, and most refreshes will return empty results.

Hybrid Approach : The hybrid strategy will employ the Pull-Based approach for all users with a large number of followers (celebrities) and the Push-Based approach for all other users.

Load Balancing

For user requests, we require a load balancer. To distribute requests among app servers, we can utilize the round-robin technique. However, if a server is unavailable, a request might be sent to it. We can employ a heartbeat system as a solution, in which each server pings the LB at a set interval to inform it that it is not down. Load balancers are required for DB and cache servers because they are also dispersed. We can use consistent hashing to decide which request should go to which server because they are both user-specific.

Load balancing in Instagram system design

The Least Bandwidth Method will be used to spread the load among the servers. This algorithm will select the server with the least amount of traffic (measured in megabits per second) (Mbps).

The Load Balancers can be placed between:

  • The client and the server.
  • The database and the server.

In case of any query and feedback, feel free to write us at Enjoy learning, enjoy system design!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.