API Rate Limiting
debt(d5/e5/b5/t5)
Closest to 'specialist tool catches it' (d5). The term's detection_hints list owasp-zap and semgrep as tools that can detect missing rate limiting headers, absence of 429 responses, and lack of per-tier differentiation. These are specialist security/SAST tools, not default linters. Without them, missing rate limiting is only caught in runtime testing or code review, but since the tools exist and are listed, d5 is appropriate.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes implementing per-API-key rate limits in Redis with proper headers and 429 responses. While conceptually straightforward, this involves adding Redis infrastructure, middleware/gateway configuration, header injection across responses, and potentially differentiating limits per endpoint. This is more than a one-line fix (not e1/e3) but typically stays within the API layer rather than requiring full architectural rework (not e7+).
Closest to 'persistent productivity tax' (b5). Rate limiting applies to web and API contexts and touches multiple concerns: gateway configuration, Redis infrastructure, per-endpoint limit tuning, header management, and monitoring. Every new endpoint needs rate limit consideration, and the strategy (sliding window, token bucket, per-key vs per-IP) shapes how the API is consumed. It's not quite b7 (it doesn't reshape every change in the system), but it is a persistent tax across API development work streams.
Closest to 'notable trap' (t5). The misconception field explicitly states that developers assume IP-based rate limiting is sufficient, when in reality shared NATs and proxies make IP a poor primary key. Additionally, common_mistakes include non-obvious gotchas: limiting at the application layer instead of the gateway (still consuming resources), using fixed windows that allow boundary bursts, returning wrong status codes (503/200 instead of 429), and applying uniform limits across endpoints with different costs. These are documented gotchas that most developers eventually learn but frequently get wrong initially.
Also Known As
TL;DR
Explanation
Rate limiting algorithms: Fixed Window (simple, reset at interval boundary — burst problem), Sliding Window (smoother, no burst at reset), Token Bucket (allows short bursts, refills at constant rate), Leaky Bucket (smooths bursts, constant output rate). Responses should include Retry-After and X-RateLimit-* headers. Rate limits should be keyed by API key, user ID, or IP — IP-based alone is easy to bypass. Differentiate limits by endpoint cost: search is heavier than a GET.
Diagram
flowchart TD
REQ[API Request] --> CHECK{Rate limit<br/>check}
CHECK -->|under limit| PROC[Process request]
CHECK -->|exceeded| BLOCK[429 Too Many Requests<br/>Retry-After header]
subgraph Algorithms
FIXED[Fixed Window<br/>100 req per minute]
SLIDE[Sliding Window<br/>smoother]
TOKEN[Token Bucket<br/>allows bursts]
LEAK[Leaky Bucket<br/>constant rate]
end
subgraph Limit By
IP[Per IP]
USER[Per user/API key]
ENDPOINT[Per endpoint]
GLOBAL[Global]
end
style PROC fill:#238636,color:#fff
style BLOCK fill:#f85149,color:#fff
Watch Out
Common Misconception
Why It Matters
Common Mistakes
- Not returning Retry-After header — clients must implement exponential backoff without it.
- Rate limiting at the application layer instead of at the gateway/nginx level — late-stage limiting still consumes resources.
- Same rate limit for all endpoints — expensive operations (search, export) need tighter limits than simple GETs.
- Not returning 429 Too Many Requests — some APIs return 503 or 200, confusing clients about whether to retry.
Avoid When
- Do not rate limit without telling the client what the limits are — silent 429s cause clients to retry aggressively and worsen the problem.
- Avoid applying identical limits to all endpoints — a read endpoint and a payment endpoint have very different abuse profiles.
- Do not rely solely on application-layer rate limiting for DoS protection — volumetric attacks must be absorbed at the gateway or CDN layer.
When To Use
- Always return Retry-After and X-RateLimit-* headers so well-behaved clients can implement automatic backoff.
- Apply rate limits at multiple granularities: per IP for unauthenticated traffic, per API key for authenticated traffic.
- Use a sliding window or token bucket algorithm for smooth limiting — fixed windows allow bursts at window boundaries.
Code Examples
// No rate limit headers — client cannot implement backoff:
HTTP/1.1 429 Too Many Requests
{"error": "Rate limited"}
// Client has no idea when to retry — exponential backoff from scratch
// Rate limit with helpful headers:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711270800
Retry-After: 47
{"type": "rate_limit_exceeded", "retry_after": 47}