Rate limits with Python and Redis



In this article, we will look at some Python and Redis based rate limit algorithms, from the simplest implementation to the advanced Generic Cell Rate Algorithm (GCRA ). We'll use redis-py



to interact with Redis ( pip install redis) . I suggest cloning my repository to experiment with query constraints.



Time limit



The first approach to limiting the number of requests per period is to use a time-limited algorithm: for each limiting key (rate-key, something unique, such as a nickname or IP address), a counter (initially setting the limit value) and an expiration date are stored separately (period), which decrease as requests are received.



With Python and Redis, you can implement this algorithm as follows:



  1. Checking the existence of the constraint key.
  2. If it exists, then initialize it with a limit value ( Redis SETNX ) and an expiration date ( Redis EXPIRE ).
  3. We decrease this value with each subsequent request ( Redis DECRBY ).
  4. Queries are stopped only when the value falls below zero.
  5. After a specified period of time, the key is automatically deleted.


from datetime import timedelta
from redis import Redis

def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
    if r.setnx(key, limit):
        r.expire(key, int(period.total_seconds()))
    bucket_val = r.get(key)
    if bucket_val and int(bucket_val) > 0:
        r.decrby(key, 1)
        return False
    return True


You can see how this code works when emulating the limit of 20 requests in 30 seconds (to make it clearer, I put the function in a module).



import redis
from datetime import timedelta
from ratelimit import time_bucketed

r = redis.Redis(host='localhost', port=6379, db=0)
requests = 25

for i in range(requests):
    if time_bucketed.request_is_limited(r, 'admin', 20, timedelta(seconds=30)):
        print ('Request is limited')
    else:
        print ('Request is allowed')


Only the first 20 requests will not be limited, after them you will have to wait 30 seconds so that new requests can be sent again.



The disadvantage of this approach is that it does not limit the frequency, but the number of requests made by the user over a certain period. If the entire limit is immediately exhausted, you will have to wait for the end of the period.



Leaky bucket algorithm



There is an approach where we can very carefully limit the frequency: this is the current bucket algorithm . The principle of operation is the same as for a real leaking bucket: we pour water (requests) into the bucket to the very edges, while some part of the water constantly flows out through the hole in the bottom (usually implemented using a background process). If the amount of water poured into the bucket is greater than the amount of water pouring out, then the bucket is filled, and when it is full, new requests are no longer accepted.



How the Algorithm Works



You can skip this approach for a more elegant solution that doesn't require a separate process to emulate the leak: the Generic Cell Rate Algorithm.



Generalized Cell Rate Control Algorithm



GCRA was created in the telecommunications industry, where it is called Asynchronous Transfer Mode (ATM). It was used by ATM network managers to delay or discard cells — small data packets of a fixed size — that arrived at a rate higher than a specified limit.



GCRA tracks the remaining limit using the so-called Theoretical Arrival Time (TAT) of each request:



tat = current_time + period


and limits the next request if the arrival time is less than the current TAT. This works well if the frequency is 1 request / period, when requests are divided into periods. But in reality, frequencies are usually calculated as a limit / period. For example, if the frequency is 10 requests / 60 seconds, then the user can make 10 requests in the first 6 seconds. And with a frequency of 1 request / 6 seconds, he will have to wait 6 seconds between requests.



To be able to send a group of requests within a short period and maintain the limitation of their number for a period with a limit> 1, each request must be divided by the period / limit ratio, and then the next theoretical arrival time ( new_tat) will be calculated differently. Let's denote the arrival time of the request as t:



  • new_tat = tat + period / limitif requests are grouped ( t <= tat)
  • new_tat = t + period / limitif requests are not grouped ( t > tat)


Hence:



new_tat = max(tat, t) + period / limit


The request will be limited, if new_tatgreater than the sum of the current time and the period: new_tat > t + period. When new_tat = tat + period / limitwe get tat + period / limit > t + period. Therefore, you need to restrict queries only when tat - t > period - period / limit.



      period — period / limit
      <----------------------->
--|----------------------------|--->
  t                           TAT


In this case, we do not need to update the TAT, because limited requests have no theoretical arrival time.



Now let's put together the final version of the code!



from datetime import timedelta
from redis import Redis

def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
    period_in_seconds = int(period.total_seconds())
    t = r.time()[0]
    separation = round(period_in_seconds / limit)
    r.setnx(key, 0)
    tat = max(int(r.get(key)), t)
    if tat - t <= period_in_seconds - separation:
        new_tat = max(tat, t) + separation
        r.set(key, new_tat)
        return False
    return True


We used Redis TIME because GCRA is time dependent, and we need to make sure the current time is consistent across multiple deployments (clock discrepancies between different machines can lead to false positives).



This code demonstrates GCRA performance at 10 requests / 60 sec.



import redis
from datetime import timedelta
from ratelimit import gcra

r = redis.Redis(host='localhost', port=6379, db=0)
requests = 10

for i in range(requests):
    if gcra.request_is_limited(r, 'admin', 10, timedelta(minutes=1)):
        print ('Request is limited')
    else:
        print ('Request is allowed')


The algorithm does not limit the first 10 requests, but you will have to wait at least 6 seconds to make the next request. Try to run the script after a while and change the value of the limit and period (for example, limit = 1and period=timedelta(seconds=6)).



To keep the implementation clean, I did not add a lock between receiving and sending calls. But it is needed for multiple requests with the same key and at the same time. With Lua, you can add locking as a context manager.



def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
    period_in_seconds = int(period.total_seconds())
    t = r.time()[0]
    separation = round(period_in_seconds / limit)
    r.setnx(key, 0)
    try:
        with r.lock('lock:' + key, blocking_timeout=5) as lock:
            tat = max(int(r.get(key)), t)
            if tat - t <= period_in_seconds - separation:
                new_tat = max(tat, t) + separation
                r.set(key, new_tat)
                return False
            return True
    except LockError:
        return True


The complete code is on GitHub .



All Articles