← Back to glossary

PHP Intl Extension — Unicode

PHP PHP 5.3+ Advanced

debt(d3/e3/b5/t5)

d3 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'default linter catches the common case' (d3). The term's detection_hints indicate automated detection is available via phpinfo, phpstan, and composer. Missing intl extension typically surfaces immediately as 'NumberFormatter not found' errors or Symfony translation component failures — these are caught at runtime on first use or during CI checks with phpstan/composer require checks. Not quite d1 (no compile-time guarantee) but reliably caught early.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). The quick_fix states: 'apt-get install php8.3-intl' — installing the extension is a one-line command, but the remediation also involves updating Docker images, server configurations, and potentially modifying code to use grapheme_* functions instead of mb_* functions. This touches configuration files and potentially multiple code locations, but remains a straightforward parameterised fix pattern.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). The applies_to field shows this affects both web and cli contexts across PHP 5.3+. Once you need proper Unicode handling (grapheme clusters, normalisation, transliteration), every text-processing feature must consider whether it's using the correct Intl functions. This creates ongoing cognitive load — developers must remember to use grapheme_strlen over mb_strlen, Transliterator over strtolower, etc. Not architectural (b7-9) but definitely a persistent tax across many work streams.

t5 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'notable trap' (t5). The misconception field explicitly states: 'mb_string functions handle all Unicode correctly' — this is the documented gotcha that most PHP devs eventually learn. The why_it_matters example (100-char limit allowing only 14 emoji) demonstrates real confusion. Competent developers familiar with mb_* functions reasonably assume they've solved Unicode, but grapheme clusters and normalisation require Intl. This is a well-known trap in the PHP community, not catastrophic (t9) but more than a minor edge case (t3).

About DEBT scoring → scored by claude-opus-4-5-20251101 · 2026-05-11 · reviewed by human

Also Known As

PHP Intl ICU PHP Normalizer grapheme functions Transliterator

TL;DR

Grapheme functions, Normalizer, and Transliterator — correct multilingual text handling beyond what mb_string provides, including emoji and combining characters.

Explanation

Intl extension (wrapping ICU): grapheme_strlen/grapheme_substr (correct for emoji and combining characters — 👨‍👩‍👧‍👦 = 1 grapheme, 7 code points), Normalizer (NFC/NFD Unicode normalisation — required before comparing or storing user text), Transliterator (convert between scripts — Cyrillic to Latin), IntlBreakIterator (word/sentence/character boundaries), IntlChar (Unicode character properties). The family emoji 👨‍👩‍👧‍👦 has 7 Unicode code points — mb_strlen returns 7, grapheme_strlen returns 1.

Common Misconception

✗ mb_string functions handle all Unicode correctly — mb_string handles multibyte encoding but not Unicode semantics: grapheme clusters (emoji + skin tones), combining characters, and Unicode normalisation all require the Intl extension.

Why It Matters

A 100-character limit enforced with mb_strlen allows only 14 visible family emoji characters (each is 7 code points) — grapheme_strlen returns the correct count of visible characters.

Common Mistakes

strlen() for character limits on user input — counts bytes not characters
mb_strlen() for grapheme-aware limits — counts code points not visible grapheme clusters
Not normalising Unicode before storing — same visual character can have multiple representations causing duplicate key errors
strtolower() for multilingual case conversion — use IntlChar or Transliterator

Code Examples

✗ Vulnerable

// Wrong character counting:
$input = 'Hello 👨‍👩‍👧‍👦'; // 1 visible emoji
$limit = 100;
if (mb_strlen($input) > $limit) { /* trim */ }
// mb_strlen: 'Hello ' = 6, family emoji = 7 code points = 13 total
// User sees 7 visible characters, system counts 13

✓ Fixed

// Grapheme-correct character counting:
$input = 'Hello 👨‍👩‍👧‍👦';
$limit = 100;
if (grapheme_strlen($input) > $limit) {
    $input = grapheme_substr($input, 0, $limit);
}
// grapheme_strlen: 'Hello ' = 6, family emoji = 1 = 7 total — correct!

// Normalise Unicode before storing to prevent duplicate keys:
$normalised = Normalizer::normalize($userInput, Normalizer::FORM_C);
$pdo->prepare('INSERT INTO users (name) VALUES (?)')->execute([$normalised]);

Tags

php i18n unicode

Added 16 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 0

No pings yet today

No pings yesterday

Amazonbot 10 Perplexity 9 ChatGPT 6 Ahrefs 4 Unknown AI 3 Bing 3 Scrapy 3 Google 2 SEMrush 2 Claude 1 Meta AI 1 PetalBot 1

Also referenced

Character Encoding 51 PHP Intl Extension 46 Pluralisation Rules Across Languages 39

How they use it

crawler 40 crawler_json 4 pre-tracking 1

Related categories

php 13.1k i18n 603

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: Low

⚡ Quick Fix

Install php-intl (apt-get install php8.3-intl) — it provides ICU-based internationalisation; check with php -m | grep intl; without it, NumberFormatter and MessageFormatter are unavailable

📦 Applies To

PHP 5.3+ web cli

🔗 Prerequisites

PHP Intl Extension Unicode Fundamentals Locale-Aware Formatting

🔍 Detection Hints

NumberFormatter not found error; Symfony translation component failing; PHP missing intl extension in Docker image or server

Auto-detectable: ✓ Yes phpinfo phpstan composer

⚠ Related Problems

PHP Intl Extension Unicode Fundamentals Character Encoding

🤖 AI Agent

Confidence: Low False Positives: Medium ✗ Manual fix Fix: Medium Context: File

References

https://www.php.net/manual/en/book.intl.php