Double URL Encoding Bypass
debt(d7/e5/b5/t9)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints specify semgrep as the tool but mark automated as 'no', meaning even semgrep patterns for $_GET/$_POST/urldecode require manual rule crafting and review. Double encoding issues do not surface as compiler errors, linter warnings, or straightforward static analysis hits — they require understanding the full data-flow from input receipt through filtering to use, and will only reliably manifest during targeted security testing or when an attacker exploits them in production.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix requires multiple coordinated changes: replacing raw $_GET filtering with post-decode validation, adding realpath() normalisation before path checks, ensuring parameterised queries are used, and removing any reliance on encoding-based filtering. This is not a single-line swap — it touches input validation logic, path handling, and potentially WAF integration points across the web-facing layer of the application.
Closest to 'persistent productivity tax' (b5). applies_to is web context only, which limits scope somewhat, but any PHP web application that handles user-supplied paths or parameters must consistently apply decode-then-validate patterns everywhere. This imposes an ongoing cognitive load on developers — every new input-handling code path must be reviewed for double-encoding susceptibility, and the pattern must be enforced across the codebase rather than in a single location.
Closest to 'catastrophic trap — the obvious way is always wrong' (t9). The misconception field states explicitly: 'URL-decoding input once before filtering is sufficient — attackers double-encode specifically to survive single-decode filters.' This is a perfect t9 scenario — the intuitive, apparently-correct defensive measure (decode once, then filter) is precisely the behaviour that the attack exploits. A competent developer applying reasonable security hygiene will still get this wrong, because the single-decode approach looks correct and passes naive testing.
TL;DR
Explanation
URL encoding: %27 = '. Double encoding: %2527 = %27 = '. If a WAF or filter decodes once and checks, then the application decodes again, the second decode reveals the payload. Common in path traversal (%2e%2e%2f = ../), XSS (%253Cscript%253E), and SQL injection. PHP's urldecode() and $_GET automatic decoding create opportunities. Defences: filter after all decoding is complete, use parameterised queries (immune to encoding tricks), validate against a whitelist of allowed characters after normalisation, use realpath() to resolve paths before checking.
Common Misconception
Why It Matters
Common Mistakes
- Filtering raw $_GET without checking if values are further encoded.
- Path traversal checks that operate before realpath() normalisation.
- Trusting WAF filtering without validating at the application layer.
Code Examples
// Checks for ../ but misses %2e%2e%2f or %252e%252e%252f:
if (strpos($_GET['file'], '../') !== false) {
die('Invalid path');
}
readfile('/uploads/' . $_GET['file']);
$file = $_GET['file'] ?? '';
// Fully normalise before validating
$path = realpath('/uploads/' . $file);
if ($path === false || !str_starts_with($path, '/uploads/')) {
http_response_code(400); die('Invalid path');
}
readfile($path);