Cache Stampede / Thundering Herd
debt(d9/e5/b5/t7)
Closest to 'silent in production until users hit it' (d9). The detection_hints note automated=no, and tools listed (Blackfire, Datadog) only reveal the symptom (backend spike, slow response) after the stampede occurs under real traffic. There is no static analysis or linter rule that can flag a missing mutex or probabilistic early expiry at code review time — the problem is invisible until concurrent production load exposes it.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes two non-trivial strategies: probabilistic early expiry or a Redis SET NX mutex lock. Implementing either requires identifying every high-traffic cache key with expensive rebuild logic, wrapping regeneration logic with locking or early-recompute math, and potentially introducing a shared locking abstraction. This goes beyond a single-line swap (e3) but is typically scoped within the caching layer rather than a full cross-cutting architectural rework (e7).
Closest to 'persistent productivity tax' (b5). The applies_to covers web and queue-worker contexts, meaning stampede protection must be considered for any cache key across multiple work streams. Every new expensive cached value requires developers to consciously apply jitter, locking, or early recompute — it's an ongoing mental tax on cache design decisions codebase-wide, but not quite a gravitational force shaping the entire architecture (b7).
Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field explicitly states that developers believe stampedes only affect very high-traffic sites, when in fact even moderate concurrency with expiring cache keys triggers the problem. The 'obvious' pattern of cache-miss → regenerate → store is universally taught but is precisely the anti-pattern here. This contradicts the standard caching mental model most developers hold, making it a serious cognitive trap.
Also Known As
TL;DR
Explanation
A cache stampede occurs when a popular cached item expires and many concurrent requests all find a cache miss simultaneously — each generates the same expensive query, flooding the database. Solutions include: mutex/lock-based regeneration (only one process rebuilds, others wait), probabilistic early expiry (start rebuilding before TTL expires with a small probability), stale-while-revalidate (serve stale data while regenerating in the background), and cache warming on deploy. In PHP with Redis, implement a lock with SET key value NX EX seconds — the first process acquires it and rebuilds, others serve stale or wait.
Diagram
sequenceDiagram
participant W1 as Worker 1
participant W2 as Worker 2
participant W3 as Worker 3
participant C as Cache
participant DB as Database
C-->>W1: Cache MISS - expired
C-->>W2: Cache MISS - expired
C-->>W3: Cache MISS - expired
W1->>DB: expensive query
W2->>DB: expensive query duplicate
W3->>DB: expensive query duplicate
Note over DB: DB overwhelmed - stampede!
Note over C: Fix: mutex lock or probabilistic early expiry
Common Misconception
Why It Matters
Common Mistakes
- No mutex or locking around cache miss regeneration — all concurrent requests hit the backend.
- Setting all cache entries to expire at the same time — use TTL jitter to spread expiry.
- Not using probabilistic early expiry — start regenerating before expiry to avoid a cliff-edge miss.
- Short TTLs on expensive-to-compute values — frequent expiry increases stampede probability.
Code Examples
// On cache miss, 100 simultaneous requests all hit the DB
$value = $cache->get('popular_data');
if ($value === null) {
$value = $this->db->expensiveQuery(); // thundering herd
$cache->set('popular_data', $value, 300);
}
// Option 1 — Mutex/lock (only one request recomputes)
$lock = $this->locks->acquire('popular_data', ttl: 10);
if ($lock) {
try {
$value = $this->db->expensiveQuery();
$cache->set('popular_data', $value, 300);
} finally { $lock->release(); }
} else {
// Other requests wait briefly then read from cache
sleep(1);
$value = $cache->get('popular_data');
}
// Option 2 — Probabilistic early recomputation (no lock needed)
// Recompute before expiry with increasing probability as TTL decreases