You must be familiar with YouTube, right! YouTube is the first source that we usually refer to for any video-based content. But have you ever thought about how YouTube works, the underlying principles behind its functionalities, how it interacts with its customer, and how it offers personalized videos? Here in this blog, we’ll give you an overview to design a highly scalable video-sharing service such as YouTube. So, let’s get started!
What is YouTube?
YouTube is a social networking service system that allows users to upload video-based media content in the application. It is one of the best advertising-based platforms for many businesses. YouTube allows its user to upload, watch, search videos, and features like dislike, adding comments to videos.
The Key Requirement of Service
Hence before designing such type of service, it is necessary to consider various requirements and system boundaries. The system should be reliable, resilient, consistent as well as remain available all the time. It should offer latency-free services and be able to provide high throughput. YouTube is a widely used service, and hence our service must fulfill various primary requirements and function efficiently. It is a vast and complicated system; hence in this article, we focused on only critical features required for the system. So our system should support these features:
- Users should be able to create an account.
- Users should be able to watch videos and search for videos/users/groups
- Users can able to log in and out of the system.
- Users should upload/delete videos and add comments/like/dislike videos in the system when they log in.
- Users must be able to follow/unfollow any channel (if logged in)
Along with these critical requirements, the system should also keep the numbers of likes, dislikes, comments, and views to present these numbers to users. The system should also be able to support private videos and private accounts.
Once all the system boundaries and functional requirements are defined, it is needed to think about the cloud or on-premise options. However, nowadays, cloud services have massive popularity because of their vast advantages like cost efficiency, high speed, security, backup solutions, unlimited storage capacity, reliability, durability, resiliency, and many more.
So we have defined our system requirements, and now that’s dive in to estimate our service’s capacity.
It is necessary to know the approximate traffic of the service to be scaled and designed accordingly. Since this service would be read-heavy, it is better to consider read to write the ratio as 25:1.
- Let's assume that we have 100 million total users.
- Let the average size of the video be 250 MB
- Videos capacity in 10 years = 10 * 100M * 1 *250 MB = 250 PB
- If replication and back-up also considered: 3*240 = 720 PB
YouTube is going to be exposed to large traffic. There can be a massive number of requests simultaneously at the system, and hence the system should be efficient enough to respond with minimum latency. Replication, sharding, and load balancer help the system to be highly available. Since our service will be read-heavy as lots of users will watch videos in the system. Hence the system should be consistent, durable, and reliable.
As YouTube is a heavily loaded service, so it has various APIs to perform its operations smoothly. There are various APIs to design video sharing services, like video, addComment, search, recommendation, and many more. We have three high-level working API for this service, and we can use either SOAP or REST architecture to implement the system.
UploadVideo( key, title, description)
Upload video API used for uploading the content. It has three key parameters. Here API key is used as an identification of the service. Upload Video API returns the HTTP response that demonstrates video is uploaded successfully or not.
DeleteVideo (key, videoID)
Delete video API is used for the deletion of the video. Initially, it checks if the user has permission to delete the video. It will return an HTTP response about the success or failure of the call.
SearchVideo (key, query)
Search video API is used to query the videos. It returns the list of videos and channels.
We need to store a large amount of data that is being uploaded to the service. There is a need to store the content and also the information about the users accessing the service. We can go with MySQL to keep data about users, information about videos, and their metadata. However, the videos can be stored in No-SQL databases using AWS S3. We have the main two tables to keep data.
Various components are required for this system to work in a well-coordinated manner so that it should be able to meet all its expectations. All the components should be available and function in a desirable number to keep the service up irrespective of the traffic.
YouTube service must be read-heavy, and hence the system should be designed to manage all the requests effectively. We can use the cache to deliver frequent content to the user with minimum latency. The video content should also be replicated on different servers to handle requests efficiently. Along with content, the system should keep replicas of metadata and user database. The traffic should be distributed using load balancers. A load balancer can be placed on various layers of the system, such as between the application server and the metadata database. Whenever the user requests any content, the system tries to find the database’s content and returns it to the user.
For building such an extensive scalable system, we have to use different caching strategies. We can use distributed cache such as Redis or Memcache to store the metadata associated with the system. To make the caching service efficiently perform all its operations, we can use LRU (Least Recently Used) algorithm as our caching strategy.
We can use CDN as a video content cache. CDN is useful in fetching media content directly from AWS S3. If the service is on AWS, it is convenient to use Cloudfront as a content cache and elastic cache service for metadata cache.
Videos are a large chunk of data, and hence uploading it is a significant process. Hence it should be necessary to shard the videos into fragments and then effectively upload the videos. If any failure occurs, it should mitigate it and upload the video from the failing point itself. The video encoding process is used for this purpose using a queue data structure.
A load balancer acts as a manager, and it effectively allows the incoming requests to be redirected to various servers to avoid any failure. We can use a load balancer at every layer of the system and the Round Robin method for balancing the load of our system.
The service should be fault-tolerant and reliable. Replication is the core component responsible for making our system more reliable and fault-tolerant as it handles services’ failures. Replication helps mitigate the latency and decrease the system’s response time as our service is read frequently; hence, replication also helps manage the resources so that all the incoming requests can be easily fulfilled.
YouTube is a very complex and highly scalable service to design. In this blog, we have covered only the fundamental concepts necessary for building such Video Sharing services. However, we have limited our discussion to the system’s generic and critical requirements, but this could be expanded further by including the various other functionalities like Personalized Recommendations, etc.