Types of Code Duplication (Clone vs Semantic)
debt(d5/e5/b5/t7)
Closest to 'specialist tool catches it' (d5). The detection_hints list phpcpd, jscpd, and sonarqube — these are specialist static analysis tools that must be explicitly configured and run in CI. They catch syntactic/clone duplication well, but semantic duplication (same logic, different surface form) is largely invisible to automated tools and requires code review, making d5 appropriate rather than d3 (no default linter covers this) or d7 (tools do exist for the clone case).
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix says to extract on third occurrence using shared base classes, traits, or services. This is not a one-line patch; it involves identifying the pattern, creating an abstraction (class, trait, or service), updating all call sites, and verifying correctness. If duplication is spread across multiple files or modules — which is the common case — this is a multi-file refactor, grounding it at e5.
Closest to 'persistent productivity tax' (b5). Code duplication applies broadly across web, cli, and queue-worker contexts. Unaddressed duplication means every bug fix or feature change may need to be applied in multiple places, slowing many work streams. However, it doesn't necessarily define the system's shape (b7+), so b5 reflects a persistent but not architectural burden.
Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception is explicitly stated: developers believe all duplication must be eliminated immediately, but premature abstraction creates coupling that is harder to undo than the original duplication. The common mistakes confirm this — extracting code that looks similar but has different reasons to change produces a wrong abstraction. This contradicts the widely taught DRY principle, making the 'obvious' action (extract immediately) frequently the wrong one, warranting t7.
TL;DR
Explanation
Clone duplication: identical or near-identical code — always extract. Semantic duplication: two code sections do similar things with different data or context — requires judgement. Rule of Three: tolerate one duplicate, refactor on the third occurrence. Types: Type 1 (exact clone), Type 2 (renamed variables), Type 3 (modified statements), Type 4 (same algorithm, different structure). Tools: phpcpd (PHP Copy/Paste Detector), jscpd, SonarQube. The wrong abstraction is worse than duplication — don't prematurely abstract code that merely looks similar but has diverging requirements.
Common Misconception
Why It Matters
Common Mistakes
- Extracting two pieces of code that look similar but have different reasons to change — they'll diverge and the shared abstraction becomes a problem.
- Not using phpcpd/jscpd in CI to detect growing duplication.
- Applying Rule of Three too rigidly — sometimes the second occurrence warrants extraction if the pattern is clearly stable.
Code Examples
// Same validation logic in 3 controllers — copy-paste:
if (strlen($name) < 2 || strlen($name) > 50) {
throw new ValidationException('Invalid name');
}
// Extracted to validator on third occurrence:
class NameValidator {
public function validate(string $name): void {
if (strlen($name) < 2 || strlen($name) > 50) {
throw new ValidationException('Invalid name');
}
}
}