← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Thundering Herd Problem

Concurrency Intermediate
debt(d9/e5/b5/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints field states automated detection is 'no', and the code pattern (cache->get.*null.*db->) requires manual review to identify. There is no linter or static tool listed that catches this; it typically surfaces only when a cache expires in production and the database is hammered, often causing an outage.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix lists multiple strategies: distributed mutex locking, stale-while-revalidate, TTL jitter, and proactive cache warming. These are not single-line patches — they require coordinated changes to caching logic, potentially across multiple cache call sites, worker startup configurations, and infrastructure settings. Not quite a cross-cutting architectural rework (e7), but more than a simple parameter swap.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). The problem applies across web, cli, and queue-worker contexts, meaning any caching or resource-contention pattern in the codebase must account for it. Once mitigation strategies (jitter, mutexes, stale serving) are in place they impose an ongoing cognitive and maintenance tax on future developers working with caches or shared resources, but they don't fully define the system's architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field directly states that developers assume thundering herd only affects caches, when it actually affects any situation where many processes simultaneously target the same resource (queue workers, connection pools, server restarts). This narrow mental model is a serious trap: fixes applied only to cache layers leave other stampede vectors untreated, and the 'obvious' mitigations like long TTLs actually just delay the problem rather than solve it.

About DEBT scoring →

TL;DR

Thundering herd: many processes simultaneously wake up to handle one event — all compete, one wins, the rest wasted work. Common after cache expiry or server restart.

Explanation

Cache stampede (a thundering herd variant): cache entry expires → hundreds of concurrent requests all miss → all hit the DB simultaneously → DB overloaded → all queries slow/timeout → all requests fail. Solutions: (1) Cache locking (mutex on cache miss — only one regenerates). (2) Cache stale-while-revalidate (serve stale during regeneration). (3) Probabilistic early expiry (randomly regenerate before expiry). (4) Cache warming before expiry. Another variant: all workers wake on a queue event — only one gets the job but all CPU-spin. Fix: use SKIP LOCKED or randomised polling intervals.

Common Misconception

Thundering herd only affects caches — it affects any situation where many processes simultaneously target the same resource: queue workers, connection pools, server restarts.

Why It Matters

Cache stampede can take down a production database in seconds — a perfectly healthy service under normal load collapses when the cache expires.

Common Mistakes

  • Long TTL to avoid expiry — just delays the stampede.
  • Not locking cache regeneration — every request races to regenerate.
  • Simultaneous queue worker start — all workers poll at the same time.

Code Examples

✗ Vulnerable
// Cache stampede:
$value = $cache->get('expensive_query');
if ($value === null) {
    $value = $db->runExpensiveQuery(); // All 500 concurrent requests hit this
    $cache->set('expensive_query', $value, 3600);
}
✓ Fixed
// Mutex on cache miss:
$value = $cache->get('expensive_query');
if ($value === null) {
    $lock = $redis->set('lock:expensive_query', 1, ['NX','EX'=>10]);
    if ($lock) {
        $value = $db->runExpensiveQuery();
        $cache->set('expensive_query', $value, 3600);
        $redis->del('lock:expensive_query');
    } else {
        // Wait briefly then re-check cache:
        usleep(100000);
        $value = $cache->get('expensive_query') ?? $fallbackValue;
    }
}

Added 23 Mar 2026
Views 54
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 2 pings M 0 pings T 1 ping W 0 pings T 1 ping F 0 pings S 6 pings S 1 ping M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Scrapy 8 Perplexity 6 ChatGPT 5 Ahrefs 5 Google 4 Unknown AI 3 Meta AI 2 Sogou 2 Qwen 1 Claude 1 Bing 1
crawler 36 crawler_json 1 pre-tracking 1
DEV INTEL Tools & Severity
🟠 High ⚙ Fix effort: Medium
⚡ Quick Fix
Lock cache regeneration with a distributed mutex. Use stale-while-revalidate. Stagger cache TTLs with random jitter. Warm caches proactively before expiry.
📦 Applies To
web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
cache->get.*null.*db->
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: High Context: File Tests: Update
CWE-400 CWE-362


✓ schema.org compliant