Health Check Patterns
debt(d7/e3/b5/t7)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate no automated detection ('automated: no'), only a code_pattern regex for /health|healthcheck. The actual misconfiguration — using a deep check for liveness instead of readiness — is invisible until a dependency goes down in production, causing cascading restarts. No static tool catches this; it requires either careful code review or observing the thundering herd failure in a staging/production environment.
Closest to 'simple parameterised fix (replace pattern with safer alternative)' (e3). The quick_fix is clear: separate /live (shallow) from /ready (deep), add startup probes, and add timeouts. This is more than a one-line patch — it involves creating or splitting endpoints and updating Kubernetes probe configuration — but it's contained within one component or service's configuration and code, not a cross-cutting refactor.
Closest to 'persistent productivity tax' (b5). Health check patterns apply to both web and CLI contexts per applies_to, and must be correctly configured for every deployed service. Wrong patterns cause operational burden across deployment pipelines, incident response, and on-call rotations. It's not architectural (b7/b9) but it does slow down multiple work streams — every new service deployment must revisit probe configuration, and a mistake persists until an incident surfaces it.
Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The misconception is precise and well-documented: developers intuitively believe a health check should be comprehensive — verifying all dependencies — so they use deep checks for liveness. This contradicts the Kubernetes model where liveness probes must be shallow/always-fast. The 'obvious' implementation (deep liveness check) directly causes cascading failures via thundering herd, making it a serious and widely encountered trap.
TL;DR
Explanation
Kubernetes probes: liveness (restart if failing), readiness (remove from load balancer if failing), startup (wait for slow start before liveness kicks in). Patterns: shallow (is process alive? just return 200), deep (check DB, cache, queue), readiness (are all dependencies ready?). Shallow for liveness — avoid cascading failures (if DB is down, all instances fail liveness, all restart, making DB situation worse). Deep for readiness — remove from rotation without restarting. Deep health check components: DB query (SELECT 1), cache ping, queue connectivity, disk space check, memory check. Timeout all checks (1-2s max).
Common Misconception
Why It Matters
Common Mistakes
- Deep health check for liveness — DB down → all instances restart → thundering herd.
- No readiness probe — instances receive traffic before warming up.
- Health check without timeout — slow dependency causes health check to time out and restart healthy instance.
Code Examples
// Liveness checks DB — cascading failure on DB outage:
route('/health/live', function() {
DB::select('SELECT 1'); // DB down → liveness fails → instance restarts → more DB pressure
return ['status' => 'ok'];
});
// Shallow liveness — is process running?
route('/health/live', fn() => ['status' => 'alive']);
// Deep readiness — can serve traffic?
route('/health/ready', function() {
$checks = [
'db' => fn() => DB::select('SELECT 1'),
'cache' => fn() => Cache::has('health-check') || Cache::set('health-check', 1, 10),
];
foreach ($checks as $name => $check) {
try { $check(); } catch (Exception $e) {
return response(['status' => 'not ready', 'failed' => $name], 503);
}
}
return ['status' => 'ready'];
});