Key String Functions (str_contains, str_starts_with, str_ends_with…)
debt(d5/e3/b5/t7)
Closest to 'specialist tool catches it' (d5). The detection_hints list phpstan and phpcs as the tools that catch multibyte misuse patterns (strlen() on user text, strtolower() on international names). These are not default linters but specialist static analysis tools that must be configured and run deliberately — so d5 fits squarely.
Closest to 'simple parameterised fix' (e3). The quick_fix is a direct function-name swap pattern: replace strlen() with mb_strlen(), strtolower() with mb_strtolower(), substr() with mb_substr(). Each replacement is mechanical but must be applied across multiple call sites in a codebase wherever user-facing text is handled — slightly more than a single one-line patch but well within one component or a search-replace pass.
Closest to 'persistent productivity tax' (b5). The misconception affects all three contexts listed (web, cli, queue-worker) and the functions are used pervasively throughout any PHP application. Every developer touching string handling must remember the mb_ discipline, and any code review touching user-facing text carries this ongoing cognitive tax. It doesn't reshape architecture (not b7/b9) but it does slow down multiple work streams.
Closest to 'serious trap' (t7). The misconception field states the canonical wrong belief explicitly: developers assume built-in str_ functions handle multibyte strings correctly, but strlen('café') returns 5 not 4. This contradicts how string functions work in many other languages where string length means character count. The strpos() returning 0 gotcha (check !== false not truthiness) is an additional well-known trap layered on top, compounding the cognitive debt.
Also Known As
TL;DR
Explanation
PHP 8.0 added str_contains(), str_starts_with(), and str_ends_with() — boolean functions replacing the error-prone strpos($hay, $needle) !== false pattern (which was false for position 0 without the strict check). Other essential string functions include mb_* variants for multibyte safety (mb_strlen, mb_strtolower), sprintf() for formatted output, trim/ltrim/rtrim for whitespace, explode/implode for splitting/joining, and str_replace/preg_replace for substitution. Always use mb_* functions when handling user input that may contain multibyte characters.
Common Misconception
Why It Matters
Common Mistakes
- Using strlen() instead of mb_strlen() for multibyte strings — strlen() counts bytes, not characters.
- Using strtolower() on Unicode strings — use mb_strtolower() with explicit encoding.
- Forgetting that strpos() returns 0 for a match at the start — check with !== false, not if($pos).
- Using substr() to split multibyte strings — use mb_substr() to avoid splitting multi-byte characters.
Code Examples
// Multibyte string handled with byte functions:
$str = 'Héllo';
echo strlen($str); // 6 — counts bytes, not chars ('é' is 2 bytes in UTF-8)
echo strtoupper($str); // 'HéLLO' — é not uppercased
// Correct:
echo mb_strlen($str, 'UTF-8'); // 5 — character count
echo mb_strtoupper($str, 'UTF-8'); // 'HÉLLO'
// PHP 8.0 str_contains / str_starts_with / str_ends_with
if (str_contains($haystack, 'needle')) {}
if (str_starts_with($url, 'https://')) {}
if (str_ends_with($file, '.php')) {}
// Multi-byte safe functions (always use for user text)
$len = mb_strlen($str, 'UTF-8');
$up = mb_strtoupper($str, 'UTF-8');
$sub = mb_substr($str, 0, 100, 'UTF-8');
$pos = mb_strpos($str, 'needle', 0, 'UTF-8');
// Padding, trimming
$padded = str_pad('42', 5, '0', STR_PAD_LEFT); // '00042'
$trimmed = trim($str, "\t\n\r\0\x0B"); // default trim chars
// sprintf for safe string building
$sql = sprintf('WHERE id = %d AND name = %s', $id, $pdo->quote($name));
$msg = sprintf('Welcome, %s! You have %d messages.', $name, $count);