Throttling and Rate Limiting

What is a Rate Limiter?

At its most basic level, a rate limiter restricts the number of events a certain object (person, device, IP, etc.) can do in a given time range. In general, a rate limiter caps how many requests a sender can issue in a specific time window. Rate Limiter then blocks requests once the cap is reached.

Why is rate-limiting used?

Rate limitation is typically implemented as a protective mechanism for services. To preserve service availability, shared services must be protected against excessive use, whether intended or unintended. Exceedingly scalable systems must have consumption limits at some point. Clients must be developed with rate limitations in mind for the system to work well and avoid cascade failure. Increasing throughput and minimizing end-to-end latency across large distributed systems, rate limiting on both the client and server sides is critical.

Benefits of using an API rate limiter

  • Avoid resource depletion as a result of a Denial of Service (DoS) attack. Almost every API offered by a big software company has some rate constraint. Twitter, for example, restricts users to 300 tweets every three hours. The Google Docs APIs have a default limit of 300 per user every 60 seconds for reading queries. By limiting excess calls, a rate limiter inhibits DoS attacks, whether deliberate or unintentional.
  • Reduce your expenses. Excess requests are limited, which means fewer servers are needed, and more resources are allocated to high-priority APIs. For firms that use paid third-party APIs, rate limiting is critical. For example, the following external APIs are charged on a per-call basis: check credit, make a payment, access health records, and so on. To save money, you must limit the number of calls you make.
  • Ensure that servers are not overburdened. A rate limiter is used to filter out extra requests produced by bots or user misconduct to reduce server load.

How To Limit API Requests And The Importance Of Rate Limiting

Let’s start with a definition of rate-limiting. Following that, we’ll look at how rate-limiting works. We’ll also talk about how important rate-limiting is.

What Is API Rate Limiting?

We’re essentially handing up the keys to the kingdom if you grant limitless access to your API. Anyone, at any time, can use your API as much as they want.

While it’s great that individuals want to use our API and find it useful, open access may lower its value and limit our company’s growth. The rate limitation of an API service determines its scalability.

The most common unit of measurement for API owners is Transactions Per Second (TPS) (TPS). Data transfer may be restricted in some systems due to physical constraints. Backend Rate Limiting includes both of these features.

API owners frequently restrict the number of requests or data clients can consume to avoid an API from becoming overburdened. Application Rate Limiting is the term for this.

API rate restriction can throttle client connections instead of disconnecting them instantly if a user submits too many queries. Clients can still utilize your services when your API is protected, thanks to throttling.

However, keep in mind that API queries might take time out at any time, and open connections increase the danger of DoS assaults.

Three Methods Of Implementing API Rate-Limiting

There are numerous ways we can rate-limit your API. Here are three of the most popular ways to go about API rate-limiting.

  1. Request Queues

There are numerous request queue libraries available, with commands specific to each programming language or development environment. This implies that a lot of the job has already been done for us.

There are even queue library folders that make finding pre-written code simple. There are already a few request-rate-limiter libraries available.

One library limits the number of requests per second to two per second and puts the remainder in a queue. There are a variety of ready-to-use request queue libraries available. They’re about as near to plug-and-play as API development can go.

  1. Throttling

Another frequent method for implementing rate-limiting in practice is throttling. It allows API developers to maintain control over how their API is utilized by establishing a temporary state that allows the API to evaluate each request. A user may be disconnected or have their bandwidth lowered if the throttle is triggered.

Throttling, which can be done at the application, API, or user level, is a popular way to rate-limit APIs. As a result, there are several ready-to-use commercial products on the market for developers. The Hybrid Data Pipeline from Progress provides throttled API access to:

  • IBM DB2
  • Oracle
  • SQL Server
  • MySQL
  • PostgreSQL
  • SAP Sybase
  • Hadoop Hive
  • Salesforce
  • Google Analytics

$count, $top, and $skip are among the built-in functions used to filter the query results supplied to the client.

For proprietary APIs, they also provide OpenAccess SDK. A conventional SQL interface, such as ODBC, JDBC, ADO.NET, or OLE-DB, is provided by the OpenAccess SDK. Most security and authorization systems interface readily with OpenAccess SDK, making it a handy firewall between APIs and back-end systems.

  1. Rate-limiting Algorithms

Another technique to make scalable rate-limited APIs is to use algorithms. Many rate-limiting techniques are already available, just like request queue libraries and throttling services.

Leaky Bucket

 The leaky bucket technique is a simple rate-limiting approach that is straightforward to implement. It converts requests into a First In First Out (FIFO) format, which allows it to process items in the queue at a consistent rate.

Leaky Bucket smooths traffic outbursts and is simple to set up on a single server or load balancer. Because of the short queue size, it’s also small and memory-efficient.

Fixed Window

 Fixed window techniques employ a basic incremental counter to track the rate of requests at a fixed rate. The window is set for a specific number of seconds, such as 3600 for one hour. Additional requests will be deleted if the counter exceeds the limit for the time period specified.

 The fixed window technique is a straightforward way to prevent our API from becoming clogged with outdated requests. This strategy, however, can still overburden your API. Our API could still be stampeded if a flood of requests is made while the window refreshes.

Sliding Log

 Each request is tracked using a time-stamped record in a sliding log technique. Logs having timestamps greater than the rate limit are deleted. When a new request comes in, the request rate is estimated by adding the logs together. If the number of requests exceeds the limit, they are simply queued.

Fixed window stampeding is not a problem with sliding log methods. Keeping an endless number of logs for each request might become rather costly. It can be costly to calculate the number of requests over different servers. Sliding log techniques aren’t ideal for building scalable APIs, avoiding overload, or mitigating DoS attacks.

Simple WindowThe fixed window counter algorithm divides the timeline into fixed-size windows, each with its own counter. Each request is assigned to a window based on its arrival time. Requests that fall within this window should be refused if the counter in the window has reached the limit.

Throttling sets a limit on how many visits can be made in a certain amount of time. As a result, the simplest solution is to keep a counter for a specific time window to tally the number of visits and then impose the following rules:

  • A visit is allowed if the number of visits is fewer than the threshold, and the number of visits increases by one.
  • Visits are denied if the number of visits exceeds the threshold and the total number of visits remains unaltered.
  • The counter is reset if the time window ends, and the time of the first successful visit after the reset is set as the current window’s starting time. The counter counts the visitors in the most recent window in this manner.

Sliding Window

 Fixed Window and Sliding Log techniques are combined in sliding window algorithms. Similar to the fixed window approach, a cumulative counter for a defined period is utilized. The previous window is also evaluated to smooth out traffic spikes.

The sliding window technique is suitable for processing massive quantities of requests while being light and fast to run due to the limited number of data points required to assess each request.

Enjoy learning, Enjoy system design!

We welcome your comments

Subscribe Our Newsletter

Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.