← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Blameless Post-Mortem

DevOps PHP 5.0+ Intermediate
debt(d9/e5/b3/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9), no automated tool detects missing or blame-laden postmortems; per detection_hints.automated: no, this is a process gap that only surfaces when incidents repeat or institutional knowledge is lost.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5), establishing blameless postmortem culture requires templates, scheduled reviews, action-item tracking in tools like Jira/Confluence, and cultural shift — not a one-line fix but contained within the ops/eng process layer.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3), the practice imposes a recurring ceremony cost on the engineering/ops process but does not shape system architecture; applies_to web/cli broadly but the burden lives in team workflow.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7), per misconception, the natural reading of 'postmortem' suggests finding who caused the incident, which directly contradicts the blameless intent; engineers reverting to blame produces self-censorship and surface fixes — the obvious approach is wrong.

About DEBT scoring →

Also Known As

post-mortem incident review blameless postmortem

TL;DR

A structured review of an incident focused on systemic causes and improvements, not individual blame — making it safe to surface failures.

Explanation

A blameless post-mortem (Google SRE practice) analyses incidents by assuming engineers made reasonable decisions given the information they had. The review documents: timeline, root cause(s), contributing factors, impact, and action items to prevent recurrence. Blame creates a culture where failures are hidden — blamelessness enables honest reporting and systemic improvement. A good post-mortem identifies the failure in the system (missing monitoring, lack of testing, unclear runbook) rather than the human who triggered it. Action items should have owners and deadlines; post-mortems are shared internally to spread learning.

Common Misconception

Postmortems are about finding who made the mistake. Blameless postmortems focus on systemic factors — what conditions made the mistake possible and likely — not individual fault. Blame discourages transparency and produces surface-level fixes; systemic analysis prevents recurrence.

Why It Matters

Blameless postmortems extract systemic lessons from incidents — focusing on what failed in the system rather than who made a mistake, so fixes address root causes not scapegoats.

Common Mistakes

  • Blame-focused postmortems — engineers self-censor and hide information to avoid punishment.
  • Postmortem action items with no owner or deadline — they are never completed.
  • Only doing postmortems for severe incidents — smaller incidents often contain the same systemic lessons.
  • Not sharing postmortems — other teams repeat the same mistakes because they never learned from yours.

Code Examples

✗ Vulnerable
# Blame-oriented postmortem (anti-pattern):
Incident: DB outage — 45 min downtime
Root cause: John ran DROP TABLE in production
Action: John received formal warning
# Missing: why was DROP TABLE possible in production?
# Missing: why was there no backup tested recently?
# Blameless version addresses system gaps, not individuals
✓ Fixed
# Post-mortem template (blameless — focus on systems, not people)

## Incident Summary
- **Date/Duration:** 2024-03-15 14:32–16:10 UTC (98 minutes)
- **Impact:** Checkout unavailable for ~40% of users
- **Severity:** SEV1

## Timeline
- 14:32 Alert: 5xx rate > 5% on /api/orders
- 14:35 On-call acknowledged, began investigation
- 14:50 Identified: new deploy introduced N+1 query → DB CPU 100%
- 15:00 Rolled back deployment → metrics recovering
- 16:10 Fully resolved

## Root Cause
Missing eager-load on order.items introduced in PR #1203.

## Action Items
- [ ] Add query count assertion to integration test suite
- [ ] Configure DB CPU alert threshold
- [ ] Add Debugbar query count check to PR checklist

Added 15 Mar 2026
Edited 22 Mar 2026
Views 65
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
1 ping T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 2 pings T 2 pings F 4 pings S 2 pings S 3 pings M 2 pings T 2 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 1 ping T 0 pings F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Scrapy 13 Amazonbot 9 Perplexity 9 SEMrush 5 Ahrefs 4 Unknown AI 3 ChatGPT 3 Google 3 Claude 2 Bing 1 Meta AI 1
crawler 49 crawler_json 3 pre-tracking 1
DEV INTEL Tools & Severity
🔵 Info ⚙ Fix effort: Medium
⚡ Quick Fix
Write a blameless postmortem within 48h of every incident: timeline, root cause, contributing factors, and action items with owners and dates — focus on system failures not human errors
📦 Applies To
PHP 5.0+ web cli
🔗 Prerequisites
🔍 Detection Hints
Incidents without documented postmortems; repeated similar incidents suggesting action items not followed up
Auto-detectable: ✗ No pagerduty opsgenie jira confluence
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: File


✓ schema.org compliant