← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

PHP Intl Extension — Unicode

php PHP 5.3+ Advanced

Also Known As

PHP Intl ICU PHP Normalizer grapheme functions Transliterator

TL;DR

Grapheme functions, Normalizer, and Transliterator — correct multilingual text handling beyond what mb_string provides, including emoji and combining characters.

Explanation

Intl extension (wrapping ICU): grapheme_strlen/grapheme_substr (correct for emoji and combining characters — 👨‍👩‍👧‍👦 = 1 grapheme, 7 code points), Normalizer (NFC/NFD Unicode normalisation — required before comparing or storing user text), Transliterator (convert between scripts — Cyrillic to Latin), IntlBreakIterator (word/sentence/character boundaries), IntlChar (Unicode character properties). The family emoji 👨‍👩‍👧‍👦 has 7 Unicode code points — mb_strlen returns 7, grapheme_strlen returns 1.

Common Misconception

mb_string functions handle all Unicode correctly — mb_string handles multibyte encoding but not Unicode semantics: grapheme clusters (emoji + skin tones), combining characters, and Unicode normalisation all require the Intl extension.

Why It Matters

A 100-character limit enforced with mb_strlen allows only 14 visible family emoji characters (each is 7 code points) — grapheme_strlen returns the correct count of visible characters.

Common Mistakes

  • strlen() for character limits on user input — counts bytes not characters
  • mb_strlen() for grapheme-aware limits — counts code points not visible grapheme clusters
  • Not normalising Unicode before storing — same visual character can have multiple representations causing duplicate key errors
  • strtolower() for multilingual case conversion — use IntlChar or Transliterator

Code Examples

✗ Vulnerable
// Wrong character counting:
$input = 'Hello 👨‍👩‍👧‍👦'; // 1 visible emoji
$limit = 100;
if (mb_strlen($input) > $limit) { /* trim */ }
// mb_strlen: 'Hello ' = 6, family emoji = 7 code points = 13 total
// User sees 7 visible characters, system counts 13
✓ Fixed
// Grapheme-correct character counting:
$input = 'Hello 👨‍👩‍👧‍👦';
$limit = 100;
if (grapheme_strlen($input) > $limit) {
    $input = grapheme_substr($input, 0, $limit);
}
// grapheme_strlen: 'Hello ' = 6, family emoji = 1 = 7 total — correct!

// Normalise Unicode before storing to prevent duplicate keys:
$normalised = Normalizer::normalize($userInput, Normalizer::FORM_C);
$pdo->prepare('INSERT INTO users (name) VALUES (?)')->execute([$normalised]);

Added 16 Mar 2026
Edited 22 Mar 2026
Views 31
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 1 ping S 4 pings S 1 ping M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 2 pings S 0 pings M 0 pings T 0 pings W 2 pings T 0 pings F 0 pings S 3 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 2 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S
No pings yet today
No pings yesterday
Perplexity 9 Amazonbot 8 Unknown AI 3 ChatGPT 3 Ahrefs 2 Google 1
crawler 24 crawler_json 1 pre-tracking 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Low
⚡ Quick Fix
Install php-intl (apt-get install php8.3-intl) — it provides ICU-based internationalisation; check with php -m | grep intl; without it, NumberFormatter and MessageFormatter are unavailable
📦 Applies To
PHP 5.3+ web cli
🔗 Prerequisites
🔍 Detection Hints
NumberFormatter not found error; Symfony translation component failing; PHP missing intl extension in Docker image or server
Auto-detectable: ✓ Yes phpinfo phpstan composer
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: Medium Context: File

✓ schema.org compliant