At most basic level, rate limiter restricts the number of events that a certain user or device can perform within a given time range. It helps us to limit the number of requests a sender can send in a particular period of time and is implemented as a protective mechanism to prevent excessive use of services. Once the upper limit is reached, the rate limiter blocks further requests.
In this blog, we will discuss components and algorithms related to design rate limiter.
It’s critical to ask questions and clarify needs and restrictions with the interviewer during a system design interview. Listed below are a few examples:
Let’s keep things simple and communicate using a basic client-server approach. Where should the rate limiter be placed? A rate limiter can be implemented intuitively on either the client or server side.
There is an alternative to the client and server-side implementations. We construct a rate limiter middleware, which throttles requests to your APIs, rather than establishing a rate limiter on the API servers, as indicated in the diagram below.
The underlying concept of rate-limiting algorithms is straightforward. We need a counter at the highest level to track how many requests are submitted from the same user, IP address, etc. The request is rejected if the counter exceeds the limit.
Where should we keep the counters? Due to the slowness of disc access, using the database is not a smart option. Because it is quick and supports a time-based expiration technique, an in-memory cache can be chosen. Redis, for example, is a popular choice for rate-limiting. “INCR” and “EXPIRE” are two commands that can be used to access in-memory storage.
There are numerous rate-limiting algorithms available. The use case, we suppose, is to limit the number of queries sent to a server.
Consider the case where we need to create an API Rate Limiter that limits API calls based on the user (We can change implementation according to our criteria). A fixed number of requests can be sent per user in a fixed time range. Let’s look at the algorithms that are utilised for API rate limiting:
Assume a bucket has a few tokens. When a request comes in, a token from the bucket must be taken and processed. The request will be refused if no token is available in the bucket, and the requester will have to try again later. As a result, the token bucket gets refreshed every time unit.
By allocating a bucket with a predetermined number of tokens to each user, we may limit the number of requests per user per time unit. When a user uses up all of his tokens in a certain amount of time, we know he or she has reached their limit, and we deny his/her requests until bucket is refilled.
The leaky bucket algorithm is based on the idea that if the average rate at which water is poured exceeds the rate at which the bucket leaks, the bucket will overflow. This algorithm uses a queue, which can be thought of as a bucket that holds the requests, to provide a simple and obvious way to rate limiting. When a request is submitted, it is added to the queue’s end. The first item in the queue is then processed at a regular frequency. Additional requests are discarded if the queue is full (or leaked).
As you can see, if 3/Request/User is allowed and the Queue size is 3, the API Rate Limiter will block the 4th Request.
We will keep a bucket per unit time window in this algorithm and update the count requests made in that time window. We will cancel the request if it exceeds the limit. A rate limiter for a service that only allows 3 requests per minute, for example, will have the data model below. The buckets are one-minute windows, with values holding the counts of requests seen during that minute.
If a new request is seen at 00:00:10, we get the count from the current bucket (00:00-00:01), which is 0, and check if processing one more request exceeds the permissible limit of 3, in which case an exception is raised; if not (which is the case here), the bucket's count is incremented by unit 1 and the request is allowed to be processed. When a request arrives at 00:00:40 and the window count is three, no request is authorised and an exception is triggered. When a new window is created, just the counts of the current window are saved, and older windows are erased (i.e in the above case, if the min changes, the older bucket is deleted)
Space complexity: O(1) for storing the count for the current window Time complexity: O(1) for get and simple atomic increment operation
Sliding Log rate limitation keeps note of each consumer’s request in a time-stamped log. These logs are normally stored in a time-sorted hash set or table. Logs with timestamps that exceed a certain limit are discarded.
Space complexity: O(Max requests seen in a window time) — In a time window, all of the requests’ timestamps are saved.
Time complexity: O(Max requests seen in a window time) — A portion of timestamps is deleted.
The low processing cost of the fixed window algorithm is combined with the increased boundary constraints of the sliding log in this hybrid approach. To begin, we track a counter for each fixed window, similar to the fixed window technique. To reduce surges of traffic, we account for a weighted value of the previous window’s request rate based on the current date.
We keep track of each user’s request counts by using various fixed time windows, such as 1/60th the size of our rate limit’s time frame. If we have a minute rate limit, for example, we can record a count for each second and calculate the sum of all counters in the previous minute when we get a new request to determine the throttling limit.
Space Complexity: O(number of buckets)
Time Complexity: O(1)
Thanks to Navtosh Kumar for his contribution to creating the first draft of content. Please write in the message below if you find anything incorrect, or if you want to share more insight. Enjoy learning, Enjoy algorithms!
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.