Token Bucket¶
The Token Bucket algorithm assigns a bucket per user that holds a maximum number of "tokens." Each request consumes a token. Tokens are added back to the bucket at a constant rate over time. If a user has no tokens left, their request is dropped.
How It Works¶
- A token bucket has a defined maximum capacity (e.g.,
10 tokens). - Tokens are added to the bucket at a constant fill rate (e.g.,
1 token per second). The total tokens never exceed the maximum capacity. - When a request comes in:
- If there are enough tokens (usually 1), a token is removed, and the request is successfully processed.
- If the bucket is empty (0 tokens), the request is dropped with a
429 Too Many Requestsstatus code.
Diagram¶
graph TD
Refill((Token Refill<br/>1 token/sec)) -->|Adds Tokens| Bucket[(Bucket<br/>Max: 10 Tokens)]
DropRefill[If Bucket Full,<br/>Drop extra tokens] -.-> Bucket
Req(Incoming Request) -->|Tries to take 1 Token| Check{Bucket Empty?}
Check -->|No| Take[Take 1 Token]
Take --> Bucket
Take --> Processed[Process Request - 200 OK]
Check -->|Yes| Drop[Drop Request - 429 Too Many Requests]
Pros and Cons¶
- Pros:
- Allows for Bursts: Unlike fixed windows, if you haven't used your API for a while, your bucket fills up to the maximum capacity. You can then spend all 10 tokens at once (burst).
- Memory Efficient: Requires storing only the timestamp of the last refill and the current token count.
- Cons:
- Requires background calculation or 'lazy' filling: Continuously filling the bucket with a background process is slow and resource-heavy. Instead, we typically fill the bucket lazily at the moment of the request by comparing the current timestamp with the last updated timestamp.
Code Example¶
A common Redis implementation using Lua scripting or pipelining (Lazy Fill):
import time
import valkey
def is_allowed(user_id: str, capacity: int, fill_rate_per_sec: float, client: valkey.Valkey) -> bool:
redis_key = f"rate_limit:tb:{user_id}"
now = time.time()
# 1. Fetch current tokens and last timestamp
results = client.hmget(redis_key, ["tokens", "last_updated"])
if results[0] is None:
# First time user: assign max tokens and process
current_tokens = capacity - 1
client.hmset(redis_key, {"tokens": current_tokens, "last_updated": now})
client.expire(redis_key, int(capacity / fill_rate_per_sec) + 10)
return True
current_tokens = float(results[0])
last_updated = float(results[1])
# 2. Lazily calculate how many tokens have been added since last request
time_passed = now - last_updated
tokens_to_add = time_passed * fill_rate_per_sec
# 3. Add to bucket, up to the maximum capacity
current_tokens = min(capacity, current_tokens + tokens_to_add)
# 4. Check if we have at least 1 token to spend
if current_tokens >= 1:
current_tokens -= 1
# Atomic set is technically needed in production (via Lua script) to avoid race conditions
client.hmset(redis_key, {"tokens": current_tokens, "last_updated": now})
return True
return False