{
    "slug": "api_rate_limiting_design",
    "term": "API Rate Limiting",
    "category": "api_design",
    "difficulty": "intermediate",
    "short": "Controlling how many requests a client can make in a time window — protecting against abuse, ensuring fair usage, and preventing accidental DoS from misbehaving clients.",
    "long": "Rate limiting algorithms: Fixed Window (simple, reset at interval boundary — burst problem), Sliding Window (smoother, no burst at reset), Token Bucket (allows short bursts, refills at constant rate), Leaky Bucket (smooths bursts, constant output rate). Responses should include Retry-After and X-RateLimit-* headers. Rate limits should be keyed by API key, user ID, or IP — IP-based alone is easy to bypass. Differentiate limits by endpoint cost: search is heavier than a GET.",
    "aliases": [
        "rate limit",
        "throttling",
        "token bucket",
        "sliding window"
    ],
    "tags": [
        "api-design",
        "security",
        "performance"
    ],
    "misconception": "IP-based rate limiting is sufficient — behind shared NAT or office proxies, thousands of users share one IP; use API key or user ID as the primary rate limit key.",
    "why_it_matters": "Without rate limiting, a single misbehaving client can exhaust all server resources — rate limiting protects availability for all users and is a first-line defence against credential stuffing.",
    "common_mistakes": [
        "Not returning Retry-After header — clients must implement exponential backoff without it.",
        "Rate limiting at the application layer instead of at the gateway/nginx level — late-stage limiting still consumes resources.",
        "Same rate limit for all endpoints — expensive operations (search, export) need tighter limits than simple GETs.",
        "Not returning 429 Too Many Requests — some APIs return 503 or 200, confusing clients about whether to retry."
    ],
    "when_to_use": [
        "Always return Retry-After and X-RateLimit-* headers so well-behaved clients can implement automatic backoff.",
        "Apply rate limits at multiple granularities: per IP for unauthenticated traffic, per API key for authenticated traffic.",
        "Use a sliding window or token bucket algorithm for smooth limiting — fixed windows allow bursts at window boundaries."
    ],
    "avoid_when": [
        "Do not rate limit without telling the client what the limits are — silent 429s cause clients to retry aggressively and worsen the problem.",
        "Avoid applying identical limits to all endpoints — a read endpoint and a payment endpoint have very different abuse profiles.",
        "Do not rely solely on application-layer rate limiting for DoS protection — volumetric attacks must be absorbed at the gateway or CDN layer."
    ],
    "related": [
        "rate_limiting",
        "http_status_codes",
        "api_design",
        "dos_attack"
    ],
    "prerequisites": [
        "rate_limiting",
        "redis_patterns",
        "api_authentication_patterns"
    ],
    "refs": [
        "https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After"
    ],
    "bad_code": "// No rate limit headers — client cannot implement backoff:\nHTTP/1.1 429 Too Many Requests\n{\"error\": \"Rate limited\"}\n// Client has no idea when to retry — exponential backoff from scratch",
    "good_code": "// Rate limit with helpful headers:\nHTTP/1.1 429 Too Many Requests\nX-RateLimit-Limit: 100\nX-RateLimit-Remaining: 0\nX-RateLimit-Reset: 1711270800\nRetry-After: 47\n{\"type\": \"rate_limit_exceeded\", \"retry_after\": 47}",
    "example_note": "The bad 429 response gives no retry guidance; the client retries immediately and amplifies the load. The fix includes Retry-After and remaining-quota headers so clients back off correctly.",
    "quick_fix": "Implement per-API-key rate limits in Redis; return X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset headers; respond 429 with Retry-After on exhaustion",
    "severity": "high",
    "effort": "medium",
    "created": "2026-03-15",
    "updated": "2026-03-31",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/api_rate_limiting_design",
        "html_url": "https://codeclaritylab.com/glossary/api_rate_limiting_design",
        "json_url": "https://codeclaritylab.com/glossary/api_rate_limiting_design.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[API Rate Limiting](https://codeclaritylab.com/glossary/api_rate_limiting_design) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/api_rate_limiting_design"
            }
        }
    }
}