What is Throttling and Rate Limiting in System Design?

What is a Rate Limiter?

A rate limiter restricts the number of events a certain object (person, device, IP, etc.) can do in a given time range. In other words, it caps how many requests a sender can issue in a specific time window and then blocks requests once the cap is reached.

Why is rate-limiting used?

To preserve service availability, shared services must be protected against excessive use, whether intended or unintended. So rate limitation is implemented as a protective mechanism to restrict the excessive use of services.

Exceedingly scalable systems must have consumption limits at some point. Clients must be developed with rate limitations in mind for the system to work well and avoid cascade failure. Increasing throughput and minimising end-to-end latency across large distributed systems, rate limiting is critical on both the client and server sides.

Benefits of using an API rate limiter

  • Avoid resource depletion due to a Denial of Service (DoS) attack: Almost every API offered by a big software company has some rate constraint. Twitter, for example, restricts users to 300 tweets every three hours. The Google Docs APIs have a default limit of 300 per user every 60 seconds for reading queries. By limiting excess calls, a rate limiter inhibits DoS attacks, whether deliberate or unintentional.
  • Reduce your expenses: If we limit the excess requests, fewer servers are needed, and more resources are allocated to high-priority APIs. For firms that use paid third-party APIs, rate limiting is critical. For example, these external APIs are charged on a per-call basis: check credit, make a payment, access health records, and so on. To save money, you must limit the number of calls you make.
  • Ensure that servers are not overburdened: A rate limiter is used to filter out extra requests produced by bots or user misconduct to reduce server load.

Let’s start with a definition of rate-limiting. Following that, we’ll look at how rate-limiting works. We’ll also talk about how important rate-limiting is.

What Is API Rate Limiting?

If we grant limitless access to our API, we will be handing up many keys. Anyone, at any time, can use our API as much as they want.

It’s great that individuals wish to use our API and find it helpful, but open access may lower its value and limit our company’s growth. The rate limitation of an API service determines the magnitude of its scalability. The most common unit of measurement for API owners is Transactions Per Second (TPS).

API owners frequently restrict the number of requests, or data clients can consume to avoid an API from becoming overburdened. API rate restriction can throttle client connections instead of disconnecting them instantly if a user submits too many queries. Thanks to throttling, clients can still utilize your services when your API is protected. However, keep in mind that API queries might take time, and open connections increase the danger of DoS assaults.

Methods of implementing API Rate-Limiting

There are numerous ways we can rate-limit your API. Here are three of the most popular ways to go about API rate-limiting.

Request Queues

There are numerous request queue libraries available, with commands specific to each programming language or development environment. This implies that a lot of the job has already been done for us.

There are even queue library folders that make finding pre-written code simple. There are already a few request-rate-limiter libraries available.

One library limits the number of requests per second to two per second and puts the remainder in a queue. There are a variety of ready-to-use request queue libraries available. They’re about as near to plug-and-play as API development can go.


Another frequent method for implementing rate-limiting in practice is throttling. It allows API developers to maintain control over how their API is utilised by establishing a temporary state that allows the API to evaluate each request. A user may be disconnected or have their bandwidth lowered if the throttle is triggered.

Throttling, which can be done at the application, API, or user level, is a popular way to rate-limit APIs. As a result, there are several ready-to-use commercial products on the market for developers. The Hybrid Data Pipeline from Progress provides throttled API access to:

  • IBM DB2
  • Oracle
  • SQL Server
  • MySQL
  • PostgreSQL
  • SAP Sybase
  • Hadoop Hive
  • Salesforce
  • Google Analytics

$count, $top, and $skip are among the built-in functions used to filter the query results supplied to the client.

For proprietary APIs, they also provide OpenAccess SDK. A conventional SQL interface, such as ODBC, JDBC, ADO.NET, or OLE-DB, is provided by the OpenAccess SDK. Most security and authorization systems interface readily with OpenAccess SDK, making it a handy firewall between APIs and back-end systems.

Rate-limiting Algorithms

Another technique to make scalable rate-limited APIs is to use algorithms. Many rate-limiting techniques are already available, just like request queue libraries and throttling services.

Leaky Bucket

 The leaky bucket technique is a simple rate-limiting approach that is straightforward to implement. It converts requests into a First In First Out (FIFO) format, which allows it to process items in the queue at a consistent rate.

Leaky Bucket smooths traffic outbursts and is simple to set up on a single server or load balancer. Because of the short queue size, it’s also small and memory-efficient.

Fixed Window

 Fixed window techniques employ a basic incremental counter to track the rate of requests at a fixed rate. The window is set for a specific number of seconds, such as 3600 for one hour. Additional requests will be deleted if the counter exceeds the limit for the time period specified.

 The fixed window technique is a straightforward way to prevent our API from becoming clogged with outdated requests. This strategy, however, can still overburden your API. Our API could still be stampeded if a flood of requests is made while the window refreshes.

Sliding Log

Each request is tracked using a time-stamped record in a sliding log technique. Logs having timestamps greater than the rate limit are deleted. When a new request comes in, the request rate is estimated by adding the logs together. If the number of requests exceeds the limit, they are simply queued.

Fixed window stampeding is not a problem with sliding log methods. Keeping an endless number of logs for each request might become rather costly. It can be costly to calculate the number of requests over different servers. Sliding log techniques aren’t ideal for building scalable APIs, avoiding overload, or mitigating DoS attacks.

Simple Window

The fixed window counter algorithm divides the timeline into fixed-size windows, each with its own counter. Each request is assigned to a window based on its arrival time. Requests that fall within this window should be refused if the counter in the window has reached the limit.

Throttling sets a limit on how many visits can be made in a certain amount of time. As a result, the simplest solution is to keep a counter for a specific time window to tally the number of visits and then impose the following rules:

  • A visit is allowed if the number of visits is fewer than the threshold, and the number of visits increases by one.
  • Visits are denied if the number of visits exceeds the threshold and the total number of visits remains unaltered.
  • The counter is reset if the time window ends, and the time of the first successful visit after the reset is set as the current window’s starting time. The counter counts the visitors in the most recent window in this manner.

Sliding Window

 Fixed Window and Sliding Log techniques are combined in sliding window algorithms. Similar to the fixed window approach, a cumulative counter for a defined period is utilised. The previous window is also evaluated to smooth out traffic spikes.

The sliding window technique is suitable for processing massive quantities of requests while being light and fast to run due to the limited number of data points required to assess each request.

Thanks to Navtosh Kumar for his contribution to creating the first version of the content. Please write in the message below if you find anything incorrect, or if you want to share more insight. Enjoy learning, Enjoy algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.