Why DEBT looks this way.
The design choices behind the standard. What was rejected, why the metaphor matters, and the philosophical shift that makes DEBT different from everything else.
The shift: from external threat to internal cost
Every published code-quality standard measures one thing: danger from outside. CVSS scores how badly an attacker can hurt you. OWASP ranks the categories of attack you should worry about. CWE classifies the kinds of weakness an attacker can find. They're all looking outward.
That works for security teams. It doesn't work for the developer who's writing the code.
This isn't a critique of CVSS. CVSS is excellent at what it does. It's a different question. CVSS asks: "how dangerous is this if exploited?" DEBT asks: "how much does this cost the person who has to live with it?"
The shift in subject matters. CVSS says "watch out for hackers." DEBT says "watch out for yourself and your code." Both are real. Neither replaces the other.
Why a profile, not a score
Every existing standard tries to collapse complexity into a single number. CVSS gives you 7.4. OWASP gives you "A03:2021." A linter gives you a count of warnings.
Single numbers are seductive because they sort. You can rank, you can prioritise, you can put them in a dashboard. But sorting hides information. 7.4 doesn't tell you whether the problem is "easy to detect, expensive to fix" or "hard to detect, easy to fix." Those are operationally different problems with different remediation strategies, and a single severity score can't distinguish them.
DEBT refuses the single-number trap on purpose. debt(d7/e4/b2/t9) is information. debt = 22 is information loss. The shape matters more than the magnitude:
- High Detectability, low everything else: invest in tooling. The bug is process.
- High Effort, low Burden: tactical refactor when budget allows. Damage is contained.
- High Burden, anything else: stop. Don't deepen the commitment. Architectural decision to revisit before it ossifies.
- High Trap, low everything else: educate the team. The bug is conceptual.
You get four levers, not one knob. That's the point.
Why these four axes
The four axes weren't chosen arbitrarily. The constraint was: each axis must measure something independent that no other published standard measures numerically.
Most candidate axes failed one of those tests:
| Axis candidate | New? | Independent? | Verdict |
|---|---|---|---|
| Detectability (D) | ✓ | ✓ | Kept |
| Effort (E) | ✓ | ✓ | Kept |
| Burden (B) | ✓ | ✓ | Kept |
| Trap (T) | ✓ | ✓ | Kept |
| Severity / Impact | ✗ | ✗ | CVSS owns this. Correlated with Effort. |
| Likelihood | ✗ | ✗ | OWASP Top 10 owns it. Correlated with Trap. |
| Exploitability | ✗ | ✓ | CVSS attack vector axis. Use CVSS for security. |
| Reach / Blast radius | ✓ | ✗ | Correlated with Burden. Folded into Burden as the "reach" flavour. |
The four that survived all measure something genuinely new AND genuinely independent. Adding a fifth would mean either repeating something CVSS already does, or accepting a correlated axis that wastes a slot.
Why four, not five or six
The literal answer: more axes break the format. debt(d3/e2/b5/t7) reads at a glance. debt(d3/e2/b5/t7/x4/y6) doesn't. Six is the threshold where readers stop parsing and start glazing over. Four is the upper bound of "I can hold this in my head."
The deeper answer: every axis I rejected was either correlated with one I kept, or already measured by another standard. Adding correlated axes feels like more information but isn't. If you can predict axis Y from axes X and Z, then Y is wasted real estate.
Why score 0 exists
Score 0 is the escape valve. It means "this axis genuinely doesn't apply to this concept" — not "the answer is very low," not "I'm not sure," but "the question can't be asked of this kind of thing." It exists for a specific reason: DEBT is meant to be applied across an entire glossary or codebase, and not every entry in such a corpus is a behavioural concept. Some are pure syntax. Some are declarative facts. Some are reference material. Forcing those into the 1–9 severity scale would corrupt the data.
Take http_status_codes as an example. What's its Detectability? — there's nothing to misuse, no compiler to catch anything. What's its Effort? — there's no fix because there's no bug. What's its Burden? — there's no architectural commitment to regret. Only Trap mildly applies (developers do confuse 401 with 403). The honest profile is debt(d0/e0/b0/t1): three axes are explicitly N/A, one axis is mildly trappy. Forcing 1s onto the three N/A axes would be lying about the data.
The counter-argument: "isn't 0 just another low score?" No — because 0 doesn't combine with the others the way 1–9 do. A query for "show me all concepts with low Burden" should return the genuinely-low b1 entries but exclude the b0 entries. The b1 concept has low Burden; the b0 concept has no Burden axis at all. Different things. The 0 escape valve preserves the distinction so downstream tooling stays honest.
This is also why score 0 lives in a dedicated band on the spec page rather than as a row inside the rubric grid. The grid is the severity scale. The band is the meta-statement. Mixing them would imply 0 is on the same axis as 1–9, when in fact it's a different category.
Why "debt" as the metaphor
Three reasons:
1. Developers already understand it.
"Technical debt" has been in working-developer vocabulary for thirty years. Most engineers intuitively grasp that some choices buy time now and charge interest later. DEBT formalises a frame everyone already half-uses.
2. Each axis literally is a kind of debt.
The four axes weren't selected to fit the metaphor. They were selected for orthogonality, and the metaphor turned out to fit them:
- Detectability = operational debt. You owe your safety net.
- Effort = remediation debt. You owe your time.
- Burden = structural debt. You owe your future self and team.
- Trap = cognitive debt. You owe your mental model.
The acronym spelling "DEBT" is the result of choosing axis names that fit the four kinds. It's not a backronym — the metaphor and the letters arrived at the same place because the structure was already there.
3. Debt reframes accountability.
Most code-quality discourse positions developers as victims: type juggling tricked them, the framework betrayed them, the architecture decayed under them. Passive voice; things happening to the codebase.
Debt flips it. Debt is something you owe. Even when you didn't choose the loan, you're carrying it. That's not shame — it's accounting. A mortgage isn't a moral failing; it's a tool you understand the cost of. DEBT brings the same posture to code.
What was rejected
A weighted aggregation function
"Compute a single DEBT number by weighting the four axes." Rejected because the weights would be arbitrary, the resulting number would conceal the shape, and any teams using DEBT would either argue over weights forever or quietly substitute their own — reproducing the CVSS-vector vs CVSS-base-score problem we were trying to avoid.
A fifth axis (Likelihood, Reach, Exploitability)
Each of these is already measured well elsewhere or correlates with an axis we kept. Cross-AI review (Gemini, ChatGPT) both flagged Reach / Blast Radius as a candidate; on examination, Reach correlates with Burden so strongly that adding it would have wasted a slot. We folded "reach" into Burden as one of its two flavours instead.
A 1–10 or 1–100 scale
1–9 is a deliberate rejection of phantom precision. Scoring is subjective; pretending otherwise with decimals (CVSS 7.4 vs 7.6) makes the system feel more rigorous than it is. Single-digit scores are honest about the granularity.
A backronym ("Developer Experience & Bug Tracking")
Tempting because the letters spell DEBT. Rejected because the backronym described two adjacent fields (DX, bug tracking) that DEBT isn't actually in. Anyone looking for DX tooling or a Jira alternative would arrive at DEBT and bounce. The metaphor stands without the wrapper.
S/D/F/R (Surprise, Detectability, Fix, Regret)
The original draft. The axis names were fine, but the letters spelled nothing — users would have to look up which axis was which every time. Renaming Surprise → Trap, Fix → Effort, Regret → Burden produced D/E/B/T, which spells the standard's name. Same axes, identical semantics, much better mnemonic.
A category-based fixed catalogue (like OWASP Top 10)
OWASP-style top-N lists are useful for prioritisation but tell you nothing about an individual concept. DEBT scores the concept, not the population. The two approaches answer different questions.
How DEBT compares
| Standard | What it measures | Subject | Single number? |
|---|---|---|---|
| CVSS | Exploit severity for known vulnerabilities | The attacker | Yes (0–10) |
| OWASP Top 10 | Most common attack categories | The attack | No (categorical rank) |
| CWE | Taxonomy of weakness types | The weakness | No (taxonomy) |
| SonarQube rules | Detectable code smells | The codebase | Yes (per-rule) |
| DEBT | Cost a concept imposes on the developer | The developer | No (4-axis profile) |
DEBT doesn't compete with CVSS, OWASP, CWE, or SonarQube. They measure different things, for different audiences. Use them together. A high-CVSS, high-DEBT concept is one your security team and your engineering team should both prioritise. A high-CVSS, low-DEBT concept is a security problem with a known fix. A low-CVSS, high-DEBT concept is the kind of thing that quietly kills a project — and DEBT is the standard designed to make that visible. The other standards weren't built to.
Limits and honest tradeoffs
DEBT isn't a calculator. It's a structured judgment. Two reviewers may disagree on a score by ±1 on each axis — that's fine. Bigger gaps mean the rubric needs sharpening or the reviewers are anchoring on different examples.
The system is also opinionated about scope. It scores concepts, not codebases. A codebase has thousands of concepts in different combinations and at different scales; DEBT can't tell you the cost of your specific implementation. It tells you the cost profile of the underlying concept your code uses. That's still useful — if your codebase leans heavily on high-Burden concepts, you can predict where it'll hurt later.
And finally: DEBT is editorial. It reflects choices the authors made about which dimensions matter. Other groups will look at this and want different axes for their own contexts (game engines, embedded systems, formal verification). The license is permissive precisely so they can fork it. The standard's value is the frame, not the specific four; if your fork has different axes for different reasons, you're still doing what DEBT is doing.
See also: the formal spec · design & badges