Merkle Tree
debt(d8/e5/b5/t7)
Closest to 'silent in production' (d9), tuned down to d8. detection_hints.automated is 'no' and only a regex code_pattern exists (md5/sha1 concat). A subtly broken Merkle implementation (missing domain separation, broken hash) produces correct-looking roots and stays silent until an attacker forges a proof — but the grep-able use of MD5/SHA-1 gives a thin catchable signal, so slightly below pure d9.
Closest to 'multiple files / significant refactor in one component' (e5). The quick_fix (switch to SHA-256, domain-separate leaf vs internal nodes, define odd-leaf rule) is more than a one-line swap — it changes the hashing scheme and invalidates all previously computed roots, requiring re-derivation across stored tree data within the component.
Closest to 'persistent productivity tax' (b5). applies_to spans web/cli/library; once a Merkle scheme defines how integrity is verified, every consumer of proofs and stored roots depends on its exact hashing convention. Changing it later breaks compatibility, imposing an ongoing tax on systems that verify against it.
Closest to 'serious trap' (t7). The misconception that a Merkle tree encrypts or hides data directly contradicts its actual behaviour — it only fingerprints for integrity, leaving blocks plain. Combined with the second-preimage trap from missing leaf/internal domain separation, the 'obvious' assumptions are commonly and dangerously wrong.
Also Known As
TL;DR
Explanation
Trees with an odd number of leaves need a defined rule for the unpaired node - duplicating the last hash is common but can introduce a real vulnerability if done carelessly, as it did in early Bitcoin (CVE-2012-2459).
Common Misconception
Why It Matters
Common Mistakes
- Using a broken hash function like MD5 or SHA-1, allowing collisions that forge a matching root.
- Not domain-separating leaf and internal node hashing, opening a second-preimage attack.
- Mishandling odd numbers of leaves by blindly duplicating the last hash without a fixed, audited rule.
- Treating the Merkle root as confidential when it only protects integrity, not secrecy.
- Rebuilding the whole tree on every update instead of recomputing only the affected O(log n) path.
Avoid When
- Data is tiny or rarely compared, where a single full-dataset hash is simpler and sufficient.
- You need to hide data contents - Merkle trees provide integrity, not confidentiality.
- The dataset changes randomly across most blocks every update, since the O(log n) proof advantage shrinks toward full rebuilds.
When To Use
- Verifying membership or integrity of one block in a large dataset with an O(log n) proof.
- Reconciling differences between two replicas by comparing roots then descending into mismatched subtrees.
- Building tamper-evident logs, content-addressed storage, or blockchain transaction roots.
Code Examples
<?php
// Insecure Merkle root: no domain separation, weak hash,
// odd-leaf duplication bug.
function merkleRoot(array $blocks): string {
$level = array_map(fn($b) => md5($b), $blocks); // weak hash
while (count($level) > 1) {
$next = [];
for ($i = 0; $i < count($level); $i += 2) {
$left = $level[$i];
// Unpaired node silently reused as-is - tamper risk
$right = $level[$i + 1] ?? $left;
$next[] = md5($left . $right); // leaves and nodes hashed alike
}
$level = $next;
}
return $level[0] ?? '';
}
<?php
// Domain-separated leaf/internal hashing, strong hash,
// explicit odd-leaf handling.
function hashLeaf(string $data): string {
return hash('sha256', "\x00" . $data, true);
}
function hashNode(string $l, string $r): string {
return hash('sha256', "\x01" . $l . $r, true);
}
function merkleRoot(array $blocks): string {
if ($blocks === []) {
return hash('sha256', '', true); // empty-tree convention
}
$level = array_map('hashLeaf', $blocks);
while (count($level) > 1) {
if (count($level) % 2 === 1) {
$level[] = end($level); // documented duplication rule
}
$next = [];
for ($i = 0; $i < count($level); $i += 2) {
$next[] = hashNode($level[$i], $level[$i + 1]);
}
$level = $next;
}
return $level[0];
}