Disjoint Set / Union-Find
debt(d7/e3/b3/t5)
Closest to 'only careful code review or runtime testing' (d7). The term's detection_hints indicate automated detection is 'no', and the code pattern describes recognizing inefficient alternatives (DFS for cycle detection, repeated BFS for components) which requires manual review or performance profiling to identify. No automated tools are specified for detecting missing union-find opportunities.
Closest to 'simple parameterised fix' (e3). The quick_fix indicates swapping to union-find when detecting components/cycles. Implementing path compression and union by rank is a localized change within the algorithm, not a cross-cutting refactor. Common mistakes like missing path compression are fixable by adding a few lines to the find() method.
Closest to 'localised tax' (b3). Union-Find is a self-contained data structure used in specific algorithmic contexts (Kruskal's MST, cycle detection). It applies to web/cli contexts but doesn't impose system-wide architectural weight. The choice is confined to the algorithm implementation and doesn't propagate throughout the codebase.
Closest to 'notable trap' (t5). The misconception explicitly states developers assume O(n) per operation, not realizing path compression and union by rank yield amortised O(α(n)). Common mistakes include not implementing these optimizations and misusing for directed graphs. These are documented gotchas that developers eventually learn but aren't immediately obvious.
Also Known As
TL;DR
Explanation
Disjoint Set (Union-Find) maintains a forest of trees where each tree represents a set. find(x) returns the root of x's tree (the set representative). union(x, y) merges the two sets. Optimisations: path compression (flatten tree on find), union by rank (always attach smaller tree to larger). Combined: near-O(1) amortised — technically O(α(n)) where α is the inverse Ackermann function, practically constant. Applications: Kruskal's Minimum Spanning Tree, network connectivity (are A and B in the same network?), image segmentation, and cycle detection in undirected graphs.
Common Misconception
Why It Matters
Common Mistakes
- Not implementing path compression — find() is O(n) without it.
- Not using union by rank — trees become linked lists without it.
- Using for directed graphs — Union-Find tracks connected components in undirected graphs only.
- Forgetting that find() modifies the structure (path compression) — not purely read-only.
Code Examples
// No optimisations — O(n) find:
class NaiveUnionFind {
private array $parent;
public function find(int $x): int {
while ($this->parent[$x] !== $x) {
$x = $this->parent[$x]; // Walk up — O(n) in worst case
}
return $x;
}
}
// Path compression + union by rank — O(alpha n) amortised:
class UnionFind {
private array $parent;
private array $rank;
public function __construct(int $n) {
$this->parent = range(0, $n - 1);
$this->rank = array_fill(0, $n, 0);
}
public function find(int $x): int {
if ($this->parent[$x] !== $x)
$this->parent[$x] = $this->find($this->parent[$x]); // Path compression
return $this->parent[$x];
}
public function union(int $x, int $y): void {
$rx = $this->find($x); $ry = $this->find($y);
if ($rx === $ry) return;
if ($this->rank[$rx] < $this->rank[$ry]) [$rx, $ry] = [$ry, $rx];
$this->parent[$ry] = $rx;
if ($this->rank[$rx] === $this->rank[$ry]) $this->rank[$rx]++;
}
}