DORA Metrics
debt(d9/e5/b5/t5)
Closest to 'silent in production until users hit it' (d9). The detection_hints state automated=no, and the code_pattern describes absence of tracking — teams simply don't know they're missing DORA instrumentation until they try to answer questions about deployment health. Tools like Datadog, Sleuth, Faros, and GitHub Insights can surface this, but only if someone actively sets them up and looks. The absence is invisible by default.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes tracking four distinct metrics across deployment pipelines, incident management, and version control. This isn't a one-line patch — it requires wiring together tooling across CI/CD, monitoring, and incident response systems. It touches multiple systems and workflows, though it's not necessarily an architectural rework.
Closest to 'persistent productivity tax' (b5). Once a team adopts DORA metrics, the ongoing burden is moderate: dashboards must be maintained, definitions kept consistent, and the temptation to game metrics (noted in common_mistakes) must be resisted. It applies broadly to web and CLI contexts and shapes how engineering processes are evaluated, but it doesn't restructurally define the codebase itself.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The misconception field flags that teams think DORA only applies to large orgs, and common_mistakes highlight optimising metrics as goals rather than signals — e.g. shipping broken code to boost deployment frequency. These are real, well-documented traps that teams commonly fall into, but they are learnable and not catastrophically counterintuitive.
Also Known As
TL;DR
Explanation
The four DORA metrics: Deployment Frequency (how often you deploy to production — elite: multiple/day), Lead Time for Changes (commit to production — elite: under 1 hour), Change Failure Rate (percentage of deployments causing incidents — elite: under 5%), Time to Restore Service (incident duration — elite: under 1 hour). Research shows these four metrics correlate strongly with organisational performance — teams that perform well on these metrics ship more value with fewer outages.
Diagram
flowchart TD
subgraph Elite Performance
DF[Deployment Frequency<br/>multiple per day]
LT[Lead Time<br/>under 1 hour]
CFR[Change Failure Rate<br/>under 5 percent]
MTTR[MTTR<br/>under 1 hour]
end
subgraph Low Performance
DF2[Deployment Frequency<br/>monthly or less]
LT2[Lead Time<br/>over 6 months]
CFR2[Change Failure Rate<br/>over 46 percent]
MTTR2[MTTR<br/>over 1 week]
end
DF & LT & CFR & MTTR --> OUTCOME[High Org Performance<br/>more features, fewer incidents]
style DF fill:#238636,color:#fff
style LT fill:#238636,color:#fff
style CFR fill:#238636,color:#fff
style MTTR fill:#238636,color:#fff
style DF2 fill:#f85149,color:#fff
style LT2 fill:#f85149,color:#fff
Common Misconception
Why It Matters
Common Mistakes
- Optimising DORA metrics as a goal rather than as a signal — shipping broken code more often improves deployment frequency but harms change failure rate.
- Not measuring change failure rate — teams focus on deployment frequency without tracking whether fast deployments are safe.
- Confusing lead time with deployment frequency — lead time measures speed from commit to production; frequency measures how often.
- Using DORA metrics for individual performance evaluation — they measure team and system performance, not individual output.
Code Examples
# Poor DORA metrics — typical 'enterprise' pattern:
# Deployment frequency: every 6 weeks (quarterly releases)
# Lead time: 4-8 weeks (approvals, change windows, manual testing)
# Change failure rate: 25% (large releases = large blast radius)
# MTTR: 4 hours (manual rollback procedures, no runbooks)
# Correlation: organisational performance typically poor
# Elite DORA metrics — CI/CD mature team:
# Deployment frequency: multiple times per day
# Lead time: < 1 hour (automated tests, no manual gates)
# Change failure rate: < 5% (small PRs, feature flags, canary)
# MTTR: < 1 hour (automated alerts, runbooks, fast rollback)
# How: small PRs + automated testing + feature flags + observability