Rate Limiting

Why Rate Limiting?

Protects services from abuse, DoS attacks, and runaway clients. Required in every API gateway design question.

A bucket holds N tokens. Each request consumes one token. Tokens replenish at a fixed rate. Allows bursting up to bucket capacity.

Requests enter a queue and are processed at a fixed rate. Smooths out bursts but introduces latency.

Count requests per time window (e.g., 100 req/min). Simple but has a boundary spike problem: 200 requests can slip through at the window boundary.

Store a timestamp for each request. Count requests in the last N seconds. Accurate but memory-intensive at scale.

Hybrid: fixed window + weighted previous window. Best accuracy/memory trade-off for production systems.

For multi-node APIs, use Redis INCR with TTL for atomic counters. Lua scripts for atomic check-and-increment to prevent race conditions.