PHP Intl Extension — Unicode
Also Known As
PHP Intl
ICU PHP
Normalizer
grapheme functions
Transliterator
TL;DR
Grapheme functions, Normalizer, and Transliterator — correct multilingual text handling beyond what mb_string provides, including emoji and combining characters.
Explanation
Intl extension (wrapping ICU): grapheme_strlen/grapheme_substr (correct for emoji and combining characters — 👨👩👧👦 = 1 grapheme, 7 code points), Normalizer (NFC/NFD Unicode normalisation — required before comparing or storing user text), Transliterator (convert between scripts — Cyrillic to Latin), IntlBreakIterator (word/sentence/character boundaries), IntlChar (Unicode character properties). The family emoji 👨👩👧👦 has 7 Unicode code points — mb_strlen returns 7, grapheme_strlen returns 1.
Common Misconception
✗ mb_string functions handle all Unicode correctly — mb_string handles multibyte encoding but not Unicode semantics: grapheme clusters (emoji + skin tones), combining characters, and Unicode normalisation all require the Intl extension.
Why It Matters
A 100-character limit enforced with mb_strlen allows only 14 visible family emoji characters (each is 7 code points) — grapheme_strlen returns the correct count of visible characters.
Common Mistakes
- strlen() for character limits on user input — counts bytes not characters
- mb_strlen() for grapheme-aware limits — counts code points not visible grapheme clusters
- Not normalising Unicode before storing — same visual character can have multiple representations causing duplicate key errors
- strtolower() for multilingual case conversion — use IntlChar or Transliterator
Code Examples
✗ Vulnerable
// Wrong character counting:
$input = 'Hello 👨👩👧👦'; // 1 visible emoji
$limit = 100;
if (mb_strlen($input) > $limit) { /* trim */ }
// mb_strlen: 'Hello ' = 6, family emoji = 7 code points = 13 total
// User sees 7 visible characters, system counts 13
✓ Fixed
// Grapheme-correct character counting:
$input = 'Hello 👨👩👧👦';
$limit = 100;
if (grapheme_strlen($input) > $limit) {
$input = grapheme_substr($input, 0, $limit);
}
// grapheme_strlen: 'Hello ' = 6, family emoji = 1 = 7 total — correct!
// Normalise Unicode before storing to prevent duplicate keys:
$normalised = Normalizer::normalize($userInput, Normalizer::FORM_C);
$pdo->prepare('INSERT INTO users (name) VALUES (?)')->execute([$normalised]);
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
16 Mar 2026
Edited
22 Mar 2026
Views
31
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 0
No pings yet today
No pings yesterday
Perplexity 9
Amazonbot 8
Unknown AI 3
ChatGPT 3
Ahrefs 2
Google 1
How they use it
crawler 24
crawler_json 1
pre-tracking 1
⚡
DEV INTEL
Tools & Severity
🟡 Medium
⚙ Fix effort: Low
⚡ Quick Fix
Install php-intl (apt-get install php8.3-intl) — it provides ICU-based internationalisation; check with php -m | grep intl; without it, NumberFormatter and MessageFormatter are unavailable
📦 Applies To
PHP 5.3+
web
cli
🔗 Prerequisites
🔍 Detection Hints
NumberFormatter not found error; Symfony translation component failing; PHP missing intl extension in Docker image or server
Auto-detectable:
✓ Yes
phpinfo
phpstan
composer
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: Medium
✗ Manual fix
Fix: Medium
Context: File