At its most basic level, a rate limiter restricts the number of events a certain object (person, device, IP, etc.) can do in a given time range. In general, a rate limiter caps how many requests a sender can issue in a specific time window. Rate Limiter then blocks requests once the cap is reached.
Rate limitation is typically implemented as a protective mechanism for services. To preserve service availability, shared services must be protected against excessive use, whether intended or unintended. Exceedingly scalable systems must have consumption limits at some point. Clients must be developed with rate limitations in mind for the system to work well and avoid cascade failure. Increasing throughput and minimizing end-to-end latency across large distributed systems, rate limiting on both the client and server sides is critical.
Let’s start with a definition of rate-limiting. Following that, we’ll look at how rate-limiting works. We’ll also talk about how important rate-limiting is.
We’re essentially handing up the keys to the kingdom if you grant limitless access to your API. Anyone, at any time, can use your API as much as they want.
While it’s great that individuals want to use our API and find it useful, open access may lower its value and limit our company’s growth. The rate limitation of an API service determines its scalability.
The most common unit of measurement for API owners is Transactions Per Second (TPS) (TPS). Data transfer may be restricted in some systems due to physical constraints. Backend Rate Limiting includes both of these features.
API owners frequently restrict the number of requests or data clients can consume to avoid an API from becoming overburdened. Application Rate Limiting is the term for this.
API rate restriction can throttle client connections instead of disconnecting them instantly if a user submits too many queries. Clients can still utilize your services when your API is protected, thanks to throttling.
However, keep in mind that API queries might take time out at any time, and open connections increase the danger of DoS assaults.
There are numerous ways we can rate-limit your API. Here are three of the most popular ways to go about API rate-limiting.
There are numerous request queue libraries available, with commands specific to each programming language or development environment. This implies that a lot of the job has already been done for us.
There are even queue library folders that make finding pre-written code simple. There are already a few request-rate-limiter libraries available.
One library limits the number of requests per second to two per second and puts the remainder in a queue. There are a variety of ready-to-use request queue libraries available. They’re about as near to plug-and-play as API development can go.
Another frequent method for implementing rate-limiting in practice is throttling. It allows API developers to maintain control over how their API is utilized by establishing a temporary state that allows the API to evaluate each request. A user may be disconnected or have their bandwidth lowered if the throttle is triggered.
Throttling, which can be done at the application, API, or user level, is a popular way to rate-limit APIs. As a result, there are several ready-to-use commercial products on the market for developers. The Hybrid Data Pipeline from Progress provides throttled API access to:
$count, $top, and $skip are among the built-in functions used to filter the query results supplied to the client.
For proprietary APIs, they also provide OpenAccess SDK. A conventional SQL interface, such as ODBC, JDBC, ADO.NET, or OLE-DB, is provided by the OpenAccess SDK. Most security and authorization systems interface readily with OpenAccess SDK, making it a handy firewall between APIs and back-end systems.
Another technique to make scalable rate-limited APIs is to use algorithms. Many rate-limiting techniques are already available, just like request queue libraries and throttling services.
The leaky bucket technique is a simple rate-limiting approach that is straightforward to implement. It converts requests into a First In First Out (FIFO) format, which allows it to process items in the queue at a consistent rate.
Leaky Bucket smooths traffic outbursts and is simple to set up on a single server or load balancer. Because of the short queue size, it’s also small and memory-efficient.
Fixed window techniques employ a basic incremental counter to track the rate of requests at a fixed rate. The window is set for a specific number of seconds, such as 3600 for one hour. Additional requests will be deleted if the counter exceeds the limit for the time period specified.
The fixed window technique is a straightforward way to prevent our API from becoming clogged with outdated requests. This strategy, however, can still overburden your API. Our API could still be stampeded if a flood of requests is made while the window refreshes.
Each request is tracked using a time-stamped record in a sliding log technique. Logs having timestamps greater than the rate limit are deleted. When a new request comes in, the request rate is estimated by adding the logs together. If the number of requests exceeds the limit, they are simply queued.
Fixed window stampeding is not a problem with sliding log methods. Keeping an endless number of logs for each request might become rather costly. It can be costly to calculate the number of requests over different servers. Sliding log techniques aren’t ideal for building scalable APIs, avoiding overload, or mitigating DoS attacks.
Simple WindowThe fixed window counter algorithm divides the timeline into fixed-size windows, each with its own counter. Each request is assigned to a window based on its arrival time. Requests that fall within this window should be refused if the counter in the window has reached the limit.
Throttling sets a limit on how many visits can be made in a certain amount of time. As a result, the simplest solution is to keep a counter for a specific time window to tally the number of visits and then impose the following rules:
Fixed Window and Sliding Log techniques are combined in sliding window algorithms. Similar to the fixed window approach, a cumulative counter for a defined period is utilized. The previous window is also evaluated to smooth out traffic spikes.
The sliding window technique is suitable for processing massive quantities of requests while being light and fast to run due to the limited number of data points required to assess each request.
Enjoy learning, Enjoy system design!
Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.