Tail Latency (p95, p99)
debt(d7/e3/b5/t9)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints list tools like Datadog, Prometheus, Grafana, and OpenTelemetry, but these only catch the problem if you've already instrumented percentile metrics — the code_pattern notes 'Only average response time tracked; no percentile metrics' as the failure mode. If your monitoring is set up with averages only, tail latency issues are silent until users complain or you deliberately look at histograms. The gap is invisible in code review; it requires deliberate operational setup to surface.
Closest to 'simple parameterised fix' (e3). The quick_fix is essentially a monitoring/alerting configuration change: add p95/p99 metrics alongside averages, set SLO at p99, alert at p95. This is not a one-line code patch (e1) because it involves updating dashboards, alert rules, and possibly instrumentation across services, but it doesn't require touching multiple application files or significant refactoring — it's a bounded observability configuration task.
Closest to 'persistent productivity tax' (b5). The applies_to covers web and API contexts broadly. Once a system is designed around average-latency SLOs and monitoring, migrating to percentile-based SLOs affects dashboards, alerting thresholds, on-call runbooks, and SLA contracts. It doesn't reshape every line of code (b7), but it imposes an ongoing tax on monitoring, capacity planning, and incident response across multiple work streams.
Closest to 'catastrophic trap' (t9). The misconception is explicit and severe: 'Average response time is the key metric for measuring application performance.' This is the canonical, widely-held wrong belief. The common_mistakes confirm that monitoring only averages is the most frequent error. The trap is catastrophic because averages can look healthy while 1% of users experience 5-second waits, and at microservice scale (why_it_matters: 10 services × 1% = ~10% of users affected) the 'obvious' metric (average) is always the wrong one to rely on.
Also Known As
TL;DR
Explanation
Average latency hides the experience of slow requests. Percentile metrics reveal tail behaviour: p50 (median), p95 (95% of requests are faster), p99, p99.9. At scale, tail latency is user-impacting — if p99 is 2 seconds and you serve 1,000 requests per second, 10 users per second experience that delay. Tail latency causes include: garbage collection pauses, lock contention, slow database queries, cold OPcache, and resource starvation. Monitor percentiles in production (Prometheus, Datadog, New Relic), set SLOs against p99, and alert when they breach thresholds rather than when averages rise.
Common Misconception
Why It Matters
Common Mistakes
- Monitoring only average latency — averages hide tail latency completely.
- Not setting per-request timeouts — one slow downstream call raises P99 for all requests waiting on it.
- Unbounded retries that amplify tail latency — a slow response retried 3 times triples the latency.
- Not hedging requests — sending a duplicate request to a second server after a threshold eliminates most tail latency.
Code Examples
// Monitoring average — hides P99 problems:
$avg = array_sum($latencies) / count($latencies); // 50ms average looks fine
// But P99 might be 2000ms — 1% of users see 2s loads
// Measure percentiles:
sort($latencies);
$p99 = $latencies[(int)(count($latencies) * 0.99)];
// Hedged requests — if P99 latency is high, send a second request
// after a short delay and use whichever responds first
function hedgedFetch(string $url, int $hedgeAfterMs = 100): mixed {
$primary = asyncFetch($url);
$secondary = delay($hedgeAfterMs)->then(fn() => asyncFetch($url));
return race([$primary, $secondary]); // first to resolve wins
}
// Measure percentiles — averages hide tail latency
// p50=10ms, p99=800ms means 1% of users wait 80x longer
// Use Prometheus histograms or StatsD timers to expose percentiles
$histogram->observe($responseTimeMs); // not just average
// Common tail latency causes: GC pauses, lock contention, network jitter
// Mitigation: timeouts + retries with jitter, connection pooling, async I/O