← Back to glossary

Thundering Herd Problem

Concurrency Intermediate

debt(d9/e5/b5/t7)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints field states automated detection is 'no', and the code pattern (cache->get.*null.*db->) requires manual review to identify. There is no linter or static tool listed that catches this; it typically surfaces only when a cache expires in production and the database is hammered, often causing an outage.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix lists multiple strategies: distributed mutex locking, stale-while-revalidate, TTL jitter, and proactive cache warming. These are not single-line patches — they require coordinated changes to caching logic, potentially across multiple cache call sites, worker startup configurations, and infrastructure settings. Not quite a cross-cutting architectural rework (e7), but more than a simple parameter swap.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). The problem applies across web, cli, and queue-worker contexts, meaning any caching or resource-contention pattern in the codebase must account for it. Once mitigation strategies (jitter, mutexes, stale serving) are in place they impose an ongoing cognitive and maintenance tax on future developers working with caches or shared resources, but they don't fully define the system's architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field directly states that developers assume thundering herd only affects caches, when it actually affects any situation where many processes simultaneously target the same resource (queue workers, connection pools, server restarts). This narrow mental model is a serious trap: fixes applied only to cache layers leave other stampede vectors untreated, and the 'obvious' mitigations like long TTLs actually just delay the problem rather than solve it.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-11 · reviewed by human

TL;DR

Thundering herd: many processes simultaneously wake up to handle one event — all compete, one wins, the rest wasted work. Common after cache expiry or server restart.

Explanation

Cache stampede (a thundering herd variant): cache entry expires → hundreds of concurrent requests all miss → all hit the DB simultaneously → DB overloaded → all queries slow/timeout → all requests fail. Solutions: (1) Cache locking (mutex on cache miss — only one regenerates). (2) Cache stale-while-revalidate (serve stale during regeneration). (3) Probabilistic early expiry (randomly regenerate before expiry). (4) Cache warming before expiry. Another variant: all workers wake on a queue event — only one gets the job but all CPU-spin. Fix: use SKIP LOCKED or randomised polling intervals.

Common Misconception

✗ Thundering herd only affects caches — it affects any situation where many processes simultaneously target the same resource: queue workers, connection pools, server restarts.

Why It Matters

Cache stampede can take down a production database in seconds — a perfectly healthy service under normal load collapses when the cache expires.

Common Mistakes

Long TTL to avoid expiry — just delays the stampede.
Not locking cache regeneration — every request races to regenerate.
Simultaneous queue worker start — all workers poll at the same time.

Code Examples

✗ Vulnerable

// Cache stampede:
$value = $cache->get('expensive_query');
if ($value === null) {
    $value = $db->runExpensiveQuery(); // All 500 concurrent requests hit this
    $cache->set('expensive_query', $value, 3600);
}

✓ Fixed

// Mutex on cache miss:
$value = $cache->get('expensive_query');
if ($value === null) {
    $lock = $redis->set('lock:expensive_query', 1, ['NX','EX'=>10]);
    if ($lock) {
        $value = $db->runExpensiveQuery();
        $cache->set('expensive_query', $value, 3600);
        $redis->del('lock:expensive_query');
    } else {
        // Wait briefly then re-check cache:
        usleep(100000);
        $value = $cache->get('expensive_query') ?? $fallbackValue;
    }
}

Tags

Added 23 Mar 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 0

No pings yet today

No pings yesterday

Scrapy 8 Perplexity 6 ChatGPT 5 Ahrefs 5 Google 4 Unknown AI 3 Meta AI 2 Sogou 2 Qwen 1 Claude 1 Bing 1

Also referenced

Mutex & Locking 76 Race Condition 72 Producer-Consumer Pattern 52 Starvation & Livelock 41

How they use it

crawler 36 crawler_json 1 pre-tracking 1

Related categories

concurrency 1.5k

⚡ DEV INTEL Tools & Severity

🟠 High ⚙ Fix effort: Medium

⚡ Quick Fix

Lock cache regeneration with a distributed mutex. Use stale-while-revalidate. Stagger cache TTLs with random jitter. Warm caches proactively before expiry.

📦 Applies To

web cli queue-worker

🔗 Prerequisites

Race Condition Mutex & Locking

🔍 Detection Hints

cache->get.*null.*db->

Auto-detectable: ✗ No

⚠ Related Problems

Race Condition Starvation & Livelock

🤖 AI Agent

Confidence: Low False Positives: High ✗ Manual fix Fix: High Context: File Tests: Update

CWE-400 CWE-362

References

https://en.wikipedia.org/wiki/Thundering_herd_problem