PHP 6 — The Version That Never Shipped
Also Known As
TL;DR
Explanation
PHP 6 development began in 2005 with one primary goal: native Unicode support throughout the entire language and standard library. Every string operation would understand multibyte characters natively, ending years of mb_string workarounds. The branch lingered for five years. The core problem was that making every string operation Unicode-aware required changes across thousands of internal functions, and the performance impact was severe — benchmarks showed 20–50% slowdowns for code that didn't even use Unicode. By 2010, the core team voted to abandon the branch. The valuable non-Unicode features that had been developed — namespaces, late static binding, closures, and goto — were backported to PHP 5.3. The version number 6 was skipped entirely to avoid confusion with the abandoned branch and the two books already published about it. PHP 7 arrived in 2015.
Common Misconception
Why It Matters
Common Mistakes
- Using strlen() on UTF-8 strings and getting byte counts instead of character counts — leads to truncation bugs with multibyte characters.
- Assuming strtolower() / strtoupper() handle accented characters — they don't; use mb_strtolower() with a locale.
- Mixing mb_string and native string functions on the same variable — substr() after mb_substr() can corrupt multibyte sequences.
- Expecting PHP to behave like Python 3 or Java where strings are Unicode objects by default — PHP strings are byte strings.
Code Examples
// ❌ Assuming native string functions are Unicode-safe — they aren't
$str = 'héllo';
echo strlen($str); // 6, not 5 — counts bytes not characters
echo strtoupper($str); // HéLLO — fails on non-ASCII
echo substr($str, 0, 3); // Hé\x (corrupts the multibyte é)
// ✅ Use mb_string for Unicode-safe string operations
$str = 'héllo';
echo mb_strlen($str); // 5 — character count
echo mb_strtoupper($str); // HÉLLO — correct
echo mb_substr($str, 0, 3); // hél — safe
// Or set the default encoding once at bootstrap
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');