← Back to glossary

Health Check Patterns

Observability Beginner

debt(d7/e3/b5/t7)

d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate no automated detection ('automated: no'), only a code_pattern regex for /health|healthcheck. The actual misconfiguration — using a deep check for liveness instead of readiness — is invisible until a dependency goes down in production, causing cascading restarts. No static tool catches this; it requires either careful code review or observing the thundering herd failure in a staging/production environment.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix (replace pattern with safer alternative)' (e3). The quick_fix is clear: separate /live (shallow) from /ready (deep), add startup probes, and add timeouts. This is more than a one-line patch — it involves creating or splitting endpoints and updating Kubernetes probe configuration — but it's contained within one component or service's configuration and code, not a cross-cutting refactor.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Health check patterns apply to both web and CLI contexts per applies_to, and must be correctly configured for every deployed service. Wrong patterns cause operational burden across deployment pipelines, incident response, and on-call rotations. It's not architectural (b7/b9) but it does slow down multiple work streams — every new service deployment must revisit probe configuration, and a mistake persists until an incident surfaces it.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The misconception is precise and well-documented: developers intuitively believe a health check should be comprehensive — verifying all dependencies — so they use deep checks for liveness. This contradicts the Kubernetes model where liveness probes must be shallow/always-fast. The 'obvious' implementation (deep liveness check) directly causes cascading failures via thundering herd, making it a serious and widely encountered trap.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-08 · reviewed by human

TL;DR

Health checks report service status to load balancers and orchestrators — /health/live (is the process running?), /health/ready (can it serve traffic?), and deep health checks for dependencies.

Explanation

Kubernetes probes: liveness (restart if failing), readiness (remove from load balancer if failing), startup (wait for slow start before liveness kicks in). Patterns: shallow (is process alive? just return 200), deep (check DB, cache, queue), readiness (are all dependencies ready?). Shallow for liveness — avoid cascading failures (if DB is down, all instances fail liveness, all restart, making DB situation worse). Deep for readiness — remove from rotation without restarting. Deep health check components: DB query (SELECT 1), cache ping, queue connectivity, disk space check, memory check. Timeout all checks (1-2s max).

Common Misconception

✗ Health check should verify all dependencies are working — use deep checks for readiness; use shallow checks for liveness. Deep liveness checks cause cascading failures when a dependency is down.

Why It Matters

Correctly implemented health checks are the foundation of zero-downtime deployments and automatic failure recovery — wrong probes cause unnecessary restarts or leave broken instances in rotation.

Common Mistakes

Deep health check for liveness — DB down → all instances restart → thundering herd.
No readiness probe — instances receive traffic before warming up.
Health check without timeout — slow dependency causes health check to time out and restart healthy instance.

Code Examples

✗ Vulnerable

// Liveness checks DB — cascading failure on DB outage:
route('/health/live', function() {
    DB::select('SELECT 1'); // DB down → liveness fails → instance restarts → more DB pressure
    return ['status' => 'ok'];
});

✓ Fixed

// Shallow liveness — is process running?
route('/health/live', fn() => ['status' => 'alive']);

// Deep readiness — can serve traffic?
route('/health/ready', function() {
    $checks = [
        'db' => fn() => DB::select('SELECT 1'),
        'cache' => fn() => Cache::has('health-check') || Cache::set('health-check', 1, 10),
    ];
    foreach ($checks as $name => $check) {
        try { $check(); } catch (Exception $e) {
            return response(['status' => 'not ready', 'failed' => $name], 503);
        }
    }
    return ['status' => 'ready'];
});

References

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/