htmlspecialchars()
debt(d5/e1/b3/t5)
Closest to 'specialist tool catches it' (d5). The detection_hints list semgrep, psalm, and phpstan — all specialist SAST/static-analysis tools. The code_pattern explicitly identifies missing ENT_QUOTES and missing charset as detectable patterns, but these are not caught by a default linter or compiler; they require dedicated static analysis configuration.
Closest to 'one-line patch or single-call swap' (e1). The quick_fix is literally a single-call replacement: swap htmlspecialchars($var) with htmlspecialchars($var, ENT_QUOTES | ENT_HTML5, 'UTF-8'). Each misuse site is an independent one-line fix.
Closest to 'localised tax' (b3). The concept applies to web PHP contexts only (applies_to: web) and each call site is independent. While it must be applied consistently across all output points, it does not impose cross-cutting architectural weight — it is a per-output-call discipline rather than a structural commitment that shapes the codebase.
Closest to 'notable trap — a documented gotcha most devs eventually learn' (t5). The misconception field directly states the trap: calling htmlspecialchars() without ENT_QUOTES feels complete but leaves single-quoted attributes exploitable. This is a documented, well-known gotcha that many PHP developers encounter and learn the hard way, but it does not fully contradict behaviour from another ecosystem — it is an under-specification trap rather than a contradiction.
Also Known As
TL;DR
Explanation
htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8') converts <, >, &, ", and ' to their HTML entity equivalents, preventing injected text from being interpreted as HTML or JavaScript. ENT_QUOTES encodes both single and double quotes. ENT_SUBSTITUTE (PHP 8.1+) replaces invalid UTF-8 sequences with a replacement character instead of returning an empty string. Always specify the charset explicitly. This function is for HTML body and attribute contexts only — different escaping is needed for JavaScript, CSS, and URLs.
Common Misconception
Why It Matters
Common Mistakes
- Forgetting the ENT_QUOTES flag — without it, single quotes are not escaped, enabling injection in single-quoted attributes.
- Not specifying the charset — defaults to latin-1 in older PHP, which can be bypassed with multi-byte characters.
- Using htmlspecialchars() in non-HTML contexts (JavaScript, CSS, URLs) — each context requires different escaping.
- Using strip_tags() instead — it removes tags but attribute-based XSS (onerror=) survives in allowed tags.
Code Examples
echo '<p>' . $userInput . '</p>'; // XSS if input contains <script>
// Always specify ENT_QUOTES and charset
echo '<p>' . htmlspecialchars($userInput, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8') . '</p>';
// Helper function — use everywhere user data touches HTML
function e(string $s): string {
return htmlspecialchars($s, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
}
echo '<input value="' . e($_GET['q']) . '">'; // safe
echo '<a href="' . e($url) . '">' . e($label) . '</a>'; // safe
// htmlspecialchars_decode() reverses it — use only for internal data, never user input