{
    "slug": "tail_latency",
    "term": "Tail Latency (p95, p99)",
    "category": "performance",
    "difficulty": "advanced",
    "short": "The latency experienced by the slowest requests — p99 is the response time below which 99% of requests fall, the most user-visible metric.",
    "long": "Average latency hides the experience of slow requests. Percentile metrics reveal tail behaviour: p50 (median), p95 (95% of requests are faster), p99, p99.9. At scale, tail latency is user-impacting — if p99 is 2 seconds and you serve 1,000 requests per second, 10 users per second experience that delay. Tail latency causes include: garbage collection pauses, lock contention, slow database queries, cold OPcache, and resource starvation. Monitor percentiles in production (Prometheus, Datadog, New Relic), set SLOs against p99, and alert when they breach thresholds rather than when averages rise.",
    "aliases": [
        "p99 latency",
        "p95 latency",
        "long tail latency",
        "percentile latency"
    ],
    "tags": [
        "performance",
        "monitoring",
        "sla"
    ],
    "misconception": "Average response time is the key metric for measuring application performance. Average latency hides outliers — a p99 of 5 seconds means 1% of users wait 5+ seconds. At scale, every user experiences the tail eventually. Monitor p95/p99 percentiles, not just averages.",
    "why_it_matters": "The 99th percentile latency (P99) represents the slowest 1% of requests — in a microservice chain where 10 services each have 1% slow requests, roughly 10% of user requests will be slow.",
    "common_mistakes": [
        "Monitoring only average latency — averages hide tail latency completely.",
        "Not setting per-request timeouts — one slow downstream call raises P99 for all requests waiting on it.",
        "Unbounded retries that amplify tail latency — a slow response retried 3 times triples the latency.",
        "Not hedging requests — sending a duplicate request to a second server after a threshold eliminates most tail latency."
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "profiling",
        "sla_slo_sre",
        "observability"
    ],
    "prerequisites": [
        "network_latency_bandwidth",
        "performance_degradation",
        "observability"
    ],
    "refs": [
        "https://brooker.co.za/blog/2021/04/19/latency.html"
    ],
    "bad_code": "// Monitoring average — hides P99 problems:\n$avg = array_sum($latencies) / count($latencies); // 50ms average looks fine\n// But P99 might be 2000ms — 1% of users see 2s loads\n\n// Measure percentiles:\nsort($latencies);\n$p99 = $latencies[(int)(count($latencies) * 0.99)];",
    "good_code": "// Hedged requests — if P99 latency is high, send a second request\n// after a short delay and use whichever responds first\nfunction hedgedFetch(string $url, int $hedgeAfterMs = 100): mixed {\n    $primary   = asyncFetch($url);\n    $secondary = delay($hedgeAfterMs)->then(fn() => asyncFetch($url));\n    return race([$primary, $secondary]); // first to resolve wins\n}\n\n// Measure percentiles — averages hide tail latency\n// p50=10ms, p99=800ms means 1% of users wait 80x longer\n// Use Prometheus histograms or StatsD timers to expose percentiles\n$histogram->observe($responseTimeMs); // not just average\n\n// Common tail latency causes: GC pauses, lock contention, network jitter\n// Mitigation: timeouts + retries with jitter, connection pooling, async I/O",
    "quick_fix": "Measure p99 and p99.9 latency alongside averages — averages hide tail latency; set your SLO at p99, alert at p95, and investigate any endpoint where p99 is >5x the p50",
    "severity": "high",
    "effort": "medium",
    "created": "2026-03-15",
    "updated": "2026-03-22",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/tail_latency",
        "html_url": "https://codeclaritylab.com/glossary/tail_latency",
        "json_url": "https://codeclaritylab.com/glossary/tail_latency.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Tail Latency (p95, p99)](https://codeclaritylab.com/glossary/tail_latency) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/tail_latency"
            }
        }
    }
}