← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

PHP Intl Extension — Unicode

PHP PHP 5.3+ Advanced
debt(d3/e3/b5/t5)
d3 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'default linter catches the common case' (d3). The term's detection_hints indicate automated detection is available via phpinfo, phpstan, and composer. Missing intl extension typically surfaces immediately as 'NumberFormatter not found' errors or Symfony translation component failures — these are caught at runtime on first use or during CI checks with phpstan/composer require checks. Not quite d1 (no compile-time guarantee) but reliably caught early.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). The quick_fix states: 'apt-get install php8.3-intl' — installing the extension is a one-line command, but the remediation also involves updating Docker images, server configurations, and potentially modifying code to use grapheme_* functions instead of mb_* functions. This touches configuration files and potentially multiple code locations, but remains a straightforward parameterised fix pattern.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). The applies_to field shows this affects both web and cli contexts across PHP 5.3+. Once you need proper Unicode handling (grapheme clusters, normalisation, transliteration), every text-processing feature must consider whether it's using the correct Intl functions. This creates ongoing cognitive load — developers must remember to use grapheme_strlen over mb_strlen, Transliterator over strtolower, etc. Not architectural (b7-9) but definitely a persistent tax across many work streams.

t5 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'notable trap' (t5). The misconception field explicitly states: 'mb_string functions handle all Unicode correctly' — this is the documented gotcha that most PHP devs eventually learn. The why_it_matters example (100-char limit allowing only 14 emoji) demonstrates real confusion. Competent developers familiar with mb_* functions reasonably assume they've solved Unicode, but grapheme clusters and normalisation require Intl. This is a well-known trap in the PHP community, not catastrophic (t9) but more than a minor edge case (t3).

About DEBT scoring →

Also Known As

PHP Intl ICU PHP Normalizer grapheme functions Transliterator

TL;DR

Grapheme functions, Normalizer, and Transliterator — correct multilingual text handling beyond what mb_string provides, including emoji and combining characters.

Explanation

Intl extension (wrapping ICU): grapheme_strlen/grapheme_substr (correct for emoji and combining characters — 👨‍👩‍👧‍👦 = 1 grapheme, 7 code points), Normalizer (NFC/NFD Unicode normalisation — required before comparing or storing user text), Transliterator (convert between scripts — Cyrillic to Latin), IntlBreakIterator (word/sentence/character boundaries), IntlChar (Unicode character properties). The family emoji 👨‍👩‍👧‍👦 has 7 Unicode code points — mb_strlen returns 7, grapheme_strlen returns 1.

Common Misconception

mb_string functions handle all Unicode correctly — mb_string handles multibyte encoding but not Unicode semantics: grapheme clusters (emoji + skin tones), combining characters, and Unicode normalisation all require the Intl extension.

Why It Matters

A 100-character limit enforced with mb_strlen allows only 14 visible family emoji characters (each is 7 code points) — grapheme_strlen returns the correct count of visible characters.

Common Mistakes

  • strlen() for character limits on user input — counts bytes not characters
  • mb_strlen() for grapheme-aware limits — counts code points not visible grapheme clusters
  • Not normalising Unicode before storing — same visual character can have multiple representations causing duplicate key errors
  • strtolower() for multilingual case conversion — use IntlChar or Transliterator

Code Examples

✗ Vulnerable
// Wrong character counting:
$input = 'Hello 👨‍👩‍👧‍👦'; // 1 visible emoji
$limit = 100;
if (mb_strlen($input) > $limit) { /* trim */ }
// mb_strlen: 'Hello ' = 6, family emoji = 7 code points = 13 total
// User sees 7 visible characters, system counts 13
✓ Fixed
// Grapheme-correct character counting:
$input = 'Hello 👨‍👩‍👧‍👦';
$limit = 100;
if (grapheme_strlen($input) > $limit) {
    $input = grapheme_substr($input, 0, $limit);
}
// grapheme_strlen: 'Hello ' = 6, family emoji = 1 = 7 total — correct!

// Normalise Unicode before storing to prevent duplicate keys:
$normalised = Normalizer::normalize($userInput, Normalizer::FORM_C);
$pdo->prepare('INSERT INTO users (name) VALUES (?)')->execute([$normalised]);

Added 16 Mar 2026
Edited 22 Mar 2026
Views 52
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 1 ping W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 1 ping S 1 ping S 1 ping M 1 ping T 0 pings W 1 ping T 0 pings F 1 ping S 1 ping S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 1 ping S 1 ping S 1 ping M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Amazonbot 10 Perplexity 9 ChatGPT 6 Ahrefs 4 Unknown AI 3 Bing 3 Scrapy 3 Google 2 SEMrush 2 Claude 1 Meta AI 1 PetalBot 1
crawler 40 crawler_json 4 pre-tracking 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Low
⚡ Quick Fix
Install php-intl (apt-get install php8.3-intl) — it provides ICU-based internationalisation; check with php -m | grep intl; without it, NumberFormatter and MessageFormatter are unavailable
📦 Applies To
PHP 5.3+ web cli
🔗 Prerequisites
🔍 Detection Hints
NumberFormatter not found error; Symfony translation component failing; PHP missing intl extension in Docker image or server
Auto-detectable: ✓ Yes phpinfo phpstan composer
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: Medium Context: File


✓ schema.org compliant