HTML Injection
debt(d5/e3/b3/t7)
Closest to 'specialist tool catches it' (d5). The detection_hints list semgrep, psalm, and twig-auto-escape — all specialist/SAST-class tools. The pattern (echo $userInput without htmlspecialchars) is detectable by these tools but not by a default linter or compiler error. Scores d5 exactly.
Closest to 'simple parameterised fix' (e3). The quick_fix is a single consistent call — htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8') — but common_mistakes show it must be applied across all output points and in context-appropriate ways (HTML vs JSON context), making it a small pattern-replacement refactor rather than a true one-liner. e3 fits: replace pattern with safer alternative across output sites.
Closest to 'localised tax' (b3). The term applies only to web contexts (applies_to contexts: web) and the fix is a consistent escaping discipline within template/output layers. It imposes a persistent but bounded hygiene requirement on view/output code, not a cross-cutting architectural concern. b3 is appropriate — one layer pays, rest of codebase is largely unaffected.
Closest to 'serious trap' (t7). The canonical misconception is explicitly that 'HTML injection without script execution is harmless' — developers coming from an XSS mental model believe only script execution matters, missing that injected forms and defacement cause real harm. This directly contradicts how many developers reason about injection risk and is worse than a simple edge case, placing it at t7.
Also Known As
TL;DR
Explanation
HTML injection differs from XSS in that it may not execute JavaScript — it injects HTML elements such as forms, links, or images to manipulate page appearance. An attacker can overlay a fake login form onto a legitimate page to harvest credentials, or inject meta-refresh tags to redirect users. The fix is identical to XSS prevention: encode all user-supplied output with htmlspecialchars() before rendering it in HTML context.
Common Misconception
Why It Matters
Common Mistakes
- Using strip_tags() as the sole defence — it strips tags but not attributes like onload in allowed tags.
- Reflecting user input into HTML attribute values without proper escaping — quotes can break out of attributes.
- Confusing HTML injection with XSS — HTML injection that doesn't execute scripts can still be very harmful.
- Not applying contextual escaping: htmlspecialchars() in HTML context, but JSON encode in script context.
Code Examples
// User HTML echoed without escaping
echo "<div>Hello \$username</div>";
// username = <h1>HACKED</h1> → injects structure
// Always escape output
echo '<div>Hello ' . htmlspecialchars(\$username, ENT_QUOTES|ENT_SUBSTITUTE, 'UTF-8') . '</div>';
// For rich text — use HTMLPurifier to allow safe HTML only
\$purifier = new HTMLPurifier(HTMLPurifier_Config::createDefault());
\$clean = \$purifier->purify(\$_POST['bio']);
echo \$clean;
// HTML injection is a stepping stone to XSS
// Even <img src=x onerror=alert(1)> counts as HTML injection