← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Path Normalisation Bypass

Security CWE-22 OWASP A1:2021 CVSS 7.5 PHP 5.0+ Intermediate
debt(d5/e3/b3/t7)
d5 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'specialist tool catches it' (d5). The detection_hints list semgrep and psalm as tools, with automated detection flagged yes, but the pattern (manual ../ stripping instead of realpath() + base-dir check) is not caught by a default linter — it requires a specialist SAST tool like semgrep with a specific rule. It won't surface as a compile or syntax error, and casual code review often misses encoded variants.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). The quick_fix is a two-step pattern: resolve with realpath() then verify with str_starts_with($resolved, $baseDir.'/). This is a small, localised fix within a single function or file. It's more than a one-line swap because common_mistakes show multiple related checks (prefix check order, str_replace filtering, basename misuse) that all need correcting together, but it stays within one component.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3). The applies_to scope covers web and cli contexts broadly, but path-handling logic tends to be concentrated in specific upload/file-serving functions rather than spread across the whole codebase. Once corrected with the realpath + str_starts_with pattern, the fix doesn't impose ongoing cost on unrelated components, so burden is localised rather than system-wide.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field states explicitly that checking for ../ sequences is considered sufficient by most developers, yet encoded variants (%2e%2e%2f), double encoding, and OS separators all bypass naive string checks. This contradicts the intuition that string containment checks are safe and mirrors how similar-looking defences work in other contexts, making it a serious cognitive trap that experienced developers regularly fall into.

About DEBT scoring →

Also Known As

path canonicalization URL normalization path canonicalisation

TL;DR

Using ../, URL encoding (%2f), or OS-specific separators to escape intended directory boundaries and access files outside an allowlisted path.

Explanation

Path normalisation attacks exploit the gap between how an application validates a path and how the OS resolves it. Common techniques: directory traversal (../../etc/passwd), URL-encoded separators (%2F, %5C on Windows), double encoding (%252F), null bytes (file.php%00.jpg in older PHP), and Windows UNC paths. PHP's realpath() resolves symlinks and traversal sequences to a canonical absolute path — always use it to validate that the resolved path starts with the intended base directory. Use basename() when you only need the filename component. Never construct file paths by concatenating user input directly, even after filtering — a filter on ../ is bypassable; a realpath() prefix check is not.

How It's Exploited

GET /download?file=../../etc/passwd
GET /download?file=..%2F..%2Fetc%2Fpasswd # URL-encoded
GET /download?file=....//....//etc/passwd # doubled-dot bypass

Common Misconception

Checking whether a path contains ../ is sufficient to prevent traversal. Encoded variants (%2e%2e%2f), double encoding, and OS-specific separators survive naive string checks. Always resolve the full canonical path with realpath() and verify it starts with the allowed base directory.

Why It Matters

Comparing or restricting paths before normalisation allows bypass via sequences like /var/www/../../etc/passwd that look different but resolve identically.

Common Mistakes

  • Checking if a path starts with an allowed prefix before calling realpath() — the check is against the un-normalised string.
  • Filtering ../ sequences with str_replace() but missing URL-encoded variants %2e%2e%2f.
  • Not verifying that realpath() output is still within the intended base directory after resolution.
  • Using basename() for security — it strips the path but the remaining filename may still be dangerous.

Code Examples

✗ Vulnerable
$file = $_GET['file'];
readfile('/var/www/files/' . $file); // ../../etc/passwd works
✓ Fixed
$base = realpath('/var/www/files');
$path = realpath($base . '/' . $_GET['file']);
if ($path === false || !str_starts_with($path, $base)) {
    abort(403);
}
readfile($path);

Added 15 Mar 2026
Edited 22 Mar 2026
Views 46
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 1 ping T 0 pings F 1 ping S 1 ping S 0 pings M 0 pings T 0 pings W 1 ping T 2 pings F 0 pings S 1 ping S 1 ping M 1 ping T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 2 pings M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Amazonbot 7 Scrapy 6 SEMrush 5 Ahrefs 4 ChatGPT 4 Unknown AI 3 Perplexity 2 Claude 2 PetalBot 2 Majestic 1 Google 1 Meta AI 1
crawler 33 crawler_json 4 pre-tracking 1
DEV INTEL Tools & Severity
🟠 High ⚙ Fix effort: Low
⚡ Quick Fix
After resolving with realpath(), verify the result strictly starts with your base directory using str_starts_with($resolved, $baseDir.'/') — the trailing slash prevents prefix attacks
📦 Applies To
PHP 5.0+ web cli
🔗 Prerequisites
🔍 Detection Hints
Manual ../ stripping instead of realpath() canonicalisation; no str_starts_with base dir check after realpath
Auto-detectable: ✓ Yes semgrep psalm
⚠ Related Problems
🤖 AI Agent
Confidence: High False Positives: Medium ✓ Auto-fixable Fix: Low Context: Line
CWE-22 CWE-23


✓ schema.org compliant