How to Add Rate Limiting to Your API

An API without rate limiting is like a restaurant with no reservation system — one large party can ruin the evening for everyone else.

RateLimiting Design Systems API DevOps Middleware Scalability Web Security

2026-04-23 12 min read

Why Rate Limiting Is Not Optional

You've built an API. It works. Users are happy. Then one morning, your database CPU spikes to 100%, your server starts returning 503s, and your on-call phone won't stop buzzing.

The culprit might be a malicious actor hammering your endpoints. Or an enthusiastic developer running a script that loops 10,000 times without a sleep. Or simply your own growth catching you off guard.

Rate limiting is the mechanism that prevents any single client — intentional or accidental — from consuming a disproportionate share of your resources. It's a foundational piece of API infrastructure that protects:

Your infrastructure from overload and cascading failures
Your other users from noisy-neighbor degradation
Your costs from runaway compute bills
Your API contract from abuse that violates your terms of service

This guide covers the algorithms behind rate limiting, practical implementation patterns, and production-grade code examples across the most common stacks.

The Core Algorithms

Before writing a single line of code, it's worth understanding the four canonical rate limiting algorithms. Each makes different trade-offs between precision, memory usage, and burst tolerance.

1. Fixed Window Counter

The simplest approach. Divide time into fixed windows (e.g., one minute) and count requests per client per window. When the count exceeds the limit, reject the request.

Window: [12:00:00 — 12:01:00]
Limit: 100 requests/minute

Client A: 87 requests → ✅ allowed
Client B: 101 requests → ❌ 101st request rejected

Pros: Dead simple, O(1) memory per client.

Cons: Vulnerable to the boundary burst problem. A client can make 100 requests at 12:00:59 and another 100 at 12:01:01 — 200 requests in 2 seconds, technically within the rules.

2. Sliding Window Log

Instead of counting in fixed buckets, store a timestamped log of each request. When a new request arrives, discard log entries older than the window size and check if the remaining count exceeds the limit.

Limit: 5 requests per minute
Log for Client A: [12:00:10, 12:00:25, 12:00:40, 12:00:55]

New request at 12:01:20:
  → Discard entries older than 12:00:20
  → Remaining log: [12:00:25, 12:00:40, 12:00:55]
  → Count: 3 → ✅ allowed, add 12:01:20 to log

Pros: Precise — no boundary burst problem.

Cons: Memory-intensive. Each client's log can grow to entries. At high request volumes, this becomes costly.

Share this article:

Key	Use Case	Granularity
IP address	Public/unauthenticated APIs	Coarse — shared IPs (NAT, proxies) cause false positives
API key	Authenticated APIs	Good — maps directly to a developer/client
User ID	User-facing APIs	Fine — limits individual users regardless of IP
Tenant ID	Multi-tenant SaaS	Limits entire organizations
Endpoint + API key	Differentiated limits per route	Fine-grained — different limits for `/search` vs `/export`

import time import redis.asyncio as redis from fastapi import FastAPI, Request, HTTPException from fastapi.responses import JSONResponse app = FastAPI() r = redis.from_url("redis://localhost:6379") async def check_rate_limit(key: str, limit: int, window: int) -> dict: now = time.time() window_ms = window * 1000 current_window = int(now * 1000 // window_ms) prev_window = current_window - 1 current_key = f"rl:{key}:{current_window}" prev_key = f"rl:{key}:{prev_window}" prev_count, curr_count = await r.mget(prev_key, current_key) prev = int(prev_count or 0) curr = int(curr_count or 0) elapsed = (now * 1000 % window_ms) / window_ms rate = int(prev * (1 - elapsed) + curr) reset_at = int((current_window + 1) * window_ms / 1000) if rate >= limit: return {"allowed": False, "remaining": 0, "reset_at": reset_at} pipe = r.pipeline() pipe.incr(current_key) pipe.expire(current_key, window * 2) await pipe.execute() return {"allowed": True, "remaining": limit - rate - 1, "reset_at": reset_at} def rate_limit(limit: int = 100, window: int = 60): async def dependency(request: Request): api_key = request.headers.get("x-api-key") or request.client.host result = await check_rate_limit(api_key, limit, window) if not result["allowed"]: raise HTTPException( status_code=429, detail={ "error": "Too Many Requests", "retryAfter": result["reset_at"], }, headers={ "X-RateLimit-Limit": str(limit), "X-RateLimit-Remaining": "0", "X-RateLimit-Reset": str(result["reset_at"]), "Retry-After": str(result["reset_at"] - int(time.time())), }, ) return result return dependency @app.get("/api/data", dependencies=[Depends(rate_limit(limit=100, window=60))]) async def get_data(): return {"data": "..."} @app.post("/api/reports", dependencies=[Depends(rate_limit(limit=10, window=60))]) async def generate_report(): return {"status": "queued"}

Header	Meaning
`X-RateLimit-Limit`	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Requests left in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds (or HTTP date) until the client may retry

How to Add Rate Limiting to Your API

Why Rate Limiting Is Not Optional

The Core Algorithms

1. Fixed Window Counter

2. Sliding Window Log

3. Sliding Window Counter (Hybrid)

4. Token Bucket

Choosing Your Rate Limit Key

Implementation: Node.js with Redis

Setup

Sliding Window Counter in Redis

Express Middleware

Implementation: Python with FastAPI

Using an Existing Library

Node.js: `express-rate-limit` + `rate-limit-redis`

Python: `slowapi`

The HTTP Response Contract

Advanced Patterns

Tiered Rate Limits

Distributed Rate Limiting with Lua Scripts

Soft Limits and Graceful Degradation

Where to Put Rate Limiting in Your Stack

Testing Your Rate Limiter

Common Mistakes to Avoid

Conclusion

InstaDoodle - AI Video Creator

InstaDoodle - AI Video Creator

related articles

Building a Real-Time Backend with Socket.io

Build a File Upload API with Node.js and Cloud Storage

How to Structure a Scalable Backend Project (Clean Architecture)

How to Add Rate Limiting to Your API

Why Rate Limiting Is Not Optional

The Core Algorithms

1. Fixed Window Counter

2. Sliding Window Log

3. Sliding Window Counter (Hybrid)

4. Token Bucket

Choosing Your Rate Limit Key

Implementation: Node.js with Redis

Setup

Sliding Window Counter in Redis

Express Middleware

Implementation: Python with FastAPI

Using an Existing Library

Node.js: express-rate-limit + rate-limit-redis

Python: slowapi

The HTTP Response Contract

Advanced Patterns

Tiered Rate Limits

Distributed Rate Limiting with Lua Scripts

Soft Limits and Graceful Degradation

Where to Put Rate Limiting in Your Stack

Testing Your Rate Limiter

Common Mistakes to Avoid

Conclusion

InstaDoodle - AI Video Creator

InstaDoodle - AI Video Creator

related articles

Building a Real-Time Backend with Socket.io

Build a File Upload API with Node.js and Cloud Storage

How to Structure a Scalable Backend Project (Clean Architecture)

Node.js: `express-rate-limit` + `rate-limit-redis`

Python: `slowapi`