Path Normalisation Bypass
debt(d5/e3/b3/t7)
Closest to 'specialist tool catches it' (d5). The detection_hints list semgrep and psalm as tools, with automated detection flagged yes, but the pattern (manual ../ stripping instead of realpath() + base-dir check) is not caught by a default linter — it requires a specialist SAST tool like semgrep with a specific rule. It won't surface as a compile or syntax error, and casual code review often misses encoded variants.
Closest to 'simple parameterised fix' (e3). The quick_fix is a two-step pattern: resolve with realpath() then verify with str_starts_with($resolved, $baseDir.'/). This is a small, localised fix within a single function or file. It's more than a one-line swap because common_mistakes show multiple related checks (prefix check order, str_replace filtering, basename misuse) that all need correcting together, but it stays within one component.
Closest to 'localised tax' (b3). The applies_to scope covers web and cli contexts broadly, but path-handling logic tends to be concentrated in specific upload/file-serving functions rather than spread across the whole codebase. Once corrected with the realpath + str_starts_with pattern, the fix doesn't impose ongoing cost on unrelated components, so burden is localised rather than system-wide.
Closest to 'serious trap' (t7). The misconception field states explicitly that checking for ../ sequences is considered sufficient by most developers, yet encoded variants (%2e%2e%2f), double encoding, and OS separators all bypass naive string checks. This contradicts the intuition that string containment checks are safe and mirrors how similar-looking defences work in other contexts, making it a serious cognitive trap that experienced developers regularly fall into.
Also Known As
TL;DR
Explanation
Path normalisation attacks exploit the gap between how an application validates a path and how the OS resolves it. Common techniques: directory traversal (../../etc/passwd), URL-encoded separators (%2F, %5C on Windows), double encoding (%252F), null bytes (file.php%00.jpg in older PHP), and Windows UNC paths. PHP's realpath() resolves symlinks and traversal sequences to a canonical absolute path — always use it to validate that the resolved path starts with the intended base directory. Use basename() when you only need the filename component. Never construct file paths by concatenating user input directly, even after filtering — a filter on ../ is bypassable; a realpath() prefix check is not.
How It's Exploited
GET /download?file=..%2F..%2Fetc%2Fpasswd # URL-encoded
GET /download?file=....//....//etc/passwd # doubled-dot bypass
Common Misconception
Why It Matters
Common Mistakes
- Checking if a path starts with an allowed prefix before calling realpath() — the check is against the un-normalised string.
- Filtering ../ sequences with str_replace() but missing URL-encoded variants %2e%2e%2f.
- Not verifying that realpath() output is still within the intended base directory after resolution.
- Using basename() for security — it strips the path but the remaining filename may still be dangerous.
Code Examples
$file = $_GET['file'];
readfile('/var/www/files/' . $file); // ../../etc/passwd works
$base = realpath('/var/www/files');
$path = realpath($base . '/' . $_GET['file']);
if ($path === false || !str_starts_with($path, $base)) {
abort(403);
}
readfile($path);