{
    "slug": "postmortem",
    "term": "Blameless Post-Mortem",
    "category": "devops",
    "difficulty": "intermediate",
    "short": "A structured review of an incident focused on systemic causes and improvements, not individual blame — making it safe to surface failures.",
    "long": "A blameless post-mortem (Google SRE practice) analyses incidents by assuming engineers made reasonable decisions given the information they had. The review documents: timeline, root cause(s), contributing factors, impact, and action items to prevent recurrence. Blame creates a culture where failures are hidden — blamelessness enables honest reporting and systemic improvement. A good post-mortem identifies the failure in the system (missing monitoring, lack of testing, unclear runbook) rather than the human who triggered it. Action items should have owners and deadlines; post-mortems are shared internally to spread learning.",
    "aliases": [
        "post-mortem",
        "incident review",
        "blameless postmortem"
    ],
    "tags": [
        "devops",
        "team-process",
        "reliability",
        "operations"
    ],
    "misconception": "Postmortems are about finding who made the mistake. Blameless postmortems focus on systemic factors — what conditions made the mistake possible and likely — not individual fault. Blame discourages transparency and produces surface-level fixes; systemic analysis prevents recurrence.",
    "why_it_matters": "Blameless postmortems extract systemic lessons from incidents — focusing on what failed in the system rather than who made a mistake, so fixes address root causes not scapegoats.",
    "common_mistakes": [
        "Blame-focused postmortems — engineers self-censor and hide information to avoid punishment.",
        "Postmortem action items with no owner or deadline — they are never completed.",
        "Only doing postmortems for severe incidents — smaller incidents often contain the same systemic lessons.",
        "Not sharing postmortems — other teams repeat the same mistakes because they never learned from yours."
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "incident_response",
        "devops",
        "observability",
        "dora_metrics"
    ],
    "prerequisites": [
        "incident_response",
        "blameless_culture",
        "observability"
    ],
    "refs": [
        "https://sre.google/sre-book/postmortem-culture/"
    ],
    "bad_code": "# Blame-oriented postmortem (anti-pattern):\nIncident: DB outage — 45 min downtime\nRoot cause: John ran DROP TABLE in production\nAction: John received formal warning\n# Missing: why was DROP TABLE possible in production?\n# Missing: why was there no backup tested recently?\n# Blameless version addresses system gaps, not individuals",
    "good_code": "# Post-mortem template (blameless — focus on systems, not people)\n\n## Incident Summary\n- **Date/Duration:** 2024-03-15 14:32–16:10 UTC (98 minutes)\n- **Impact:** Checkout unavailable for ~40% of users\n- **Severity:** SEV1\n\n## Timeline\n- 14:32 Alert: 5xx rate > 5% on /api/orders\n- 14:35 On-call acknowledged, began investigation\n- 14:50 Identified: new deploy introduced N+1 query → DB CPU 100%\n- 15:00 Rolled back deployment → metrics recovering\n- 16:10 Fully resolved\n\n## Root Cause\nMissing eager-load on order.items introduced in PR #1203.\n\n## Action Items\n- [ ] Add query count assertion to integration test suite\n- [ ] Configure DB CPU alert threshold\n- [ ] Add Debugbar query count check to PR checklist",
    "quick_fix": "Write a blameless postmortem within 48h of every incident: timeline, root cause, contributing factors, and action items with owners and dates — focus on system failures not human errors",
    "severity": "info",
    "effort": "medium",
    "created": "2026-03-15",
    "updated": "2026-03-22",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/postmortem",
        "html_url": "https://codeclaritylab.com/glossary/postmortem",
        "json_url": "https://codeclaritylab.com/glossary/postmortem.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Blameless Post-Mortem](https://codeclaritylab.com/glossary/postmortem) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/postmortem"
            }
        }
    }
}