← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Collation & Locale-Aware Sorting

i18n PHP 7.0+ Intermediate
debt(d9/e3/b5/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). Byte-order sorting produces valid output that passes all automated tests — the code runs without errors, arrays sort without exceptions, queries execute successfully. The bug is only visible when a human user who speaks the relevant language notices that ä appears after z or ñ is in the wrong position. No linter, static analyzer, or automated test catches this unless you write explicit locale-aware test cases with known correct orderings.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). The quick_fix shows this is a straightforward swap: replace sort($array) with (new Collator($locale))->sort($array), or change collation from utf8mb4_general_ci to utf8mb4_unicode_ci. However, the fix may touch multiple files if sorting is spread across the codebase, and database collation changes require ALTER TABLE statements and potential reindexing, pushing slightly beyond a pure one-liner.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Collation is a cross-cutting concern affecting any code that sorts or compares strings for display — user lists, product catalogs, search results, autocomplete. Once the wrong approach is established, every new sorting feature inherits the bug. The fix requires awareness at both PHP and database layers, and developers must remember to use Collator instead of sort() throughout the codebase. Not architectural-level, but a persistent tax on string handling.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field states developers expect sort() and ORDER BY to produce correct alphabetical order — this contradicts the intuitive mental model from English-only development where byte order and linguistic order happen to align for ASCII. A developer coming from English contexts will confidently use strcmp() and be surprised when German or Spanish users complain. The 'obvious' approach works for English and fails silently for other languages.

About DEBT scoring →

Also Known As

collation locale sorting unicode collation string comparison alphabetical order

TL;DR

Locale-specific rules for ordering strings alphabetically — determining that ä sorts near a in German but after z in traditional Swedish, and that sorting must not rely on byte values for Unicode text.

Explanation

Collation defines the comparison order for strings. ASCII byte comparison (strcmp, ORDER BY in default MySQL) sorts uppercase before lowercase, treats accented characters as greater than z, and has no concept of locale-specific sort rules. For a German user, ä should sort near a; for a Swedish user, ä sorts after z. The Unicode Collation Algorithm (UCA) provides a standardised method for locale-aware sorting. PHP's intl extension provides Collator: $collator = new Collator('de_DE'); $collator->sort($array). MySQL and PostgreSQL support collation at the column or query level — utf8mb4_unicode_ci for MySQL is accent-insensitive and case-insensitive; specific locale collations like de_DE provide German sorting rules. Database-level collation affects ORDER BY, GROUP BY, and index usage; PHP-level sorting is used for in-memory arrays.

Common Misconception

PHP's sort() and MySQL's ORDER BY produce correct alphabetical order for all languages. Both use byte-value comparison by default. The byte value of ä (0xC3 0xA4 in UTF-8) is much larger than z (0x7A), so ä sorts after z alphabetically in byte order. German users expect ä near a. The same applies to ñ in Spanish, ő in Hungarian, and every accented character. Applications that sort user names, product titles, or any display content must use locale-aware collation for correct results.

Why It Matters

Incorrect sorting is immediately visible to users in their native language. A PHP application sorting a list of German city names with strcmp() produces a list where Ärger appears after Zürich instead of between Aalen and Altenburg — obviously wrong to any German speaker. In e-commerce and directories, sort order directly affects discovery and usability. Using PHP's Collator or configuring the correct MySQL collation costs minimal effort and produces correct results for all supported locales.

Common Mistakes

  • Using strcmp() or sort() for locale-sensitive string comparison — both use byte order, not linguistic order.
  • Setting database collation to utf8mb4_general_ci instead of utf8mb4_unicode_ci — general_ci has faster comparisons but less accurate Unicode handling.
  • Not specifying collation in ORDER BY for multilingual tables — default database collation may be wrong for specific query contexts.
  • Sorting in PHP after fetching from a correctly-collated database — the database should handle sorting; PHP re-sorting discards the correct database collation.

Code Examples

✗ Vulnerable
// Byte-order sort — wrong for any non-ASCII alphabet
$names = ['Müller', 'Maier', 'Ärger', 'Bauer'];
sort($names);
// Result: ['Bauer', 'Maier', 'Müller', 'Ärger'] — Ärger last, wrong
✓ Fixed
// Locale-aware sort — correct German alphabetical order
$names = ['Müller', 'Maier', 'Ärger', 'Bauer'];
$collator = new Collator('de_DE');
$collator->sort($names);
// Result: ['Ärger', 'Bauer', 'Maier', 'Müller'] — correct

// MySQL — locale-aware ORDER BY
// ALTER TABLE products MODIFY name VARCHAR(255)
//     CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
// SELECT * FROM products ORDER BY name COLLATE utf8mb4_unicode_ci;

Added 23 Mar 2026
Edited 5 Apr 2026
Views 38
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 0 pings T 1 ping F 2 pings S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 1 ping F 1 ping S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 1 ping S
Amazonbot 14 Perplexity 10 Ahrefs 3 Google 2 ChatGPT 1 Meta AI 1
crawler 30 crawler_json 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Low
⚡ Quick Fix
Use new Collator($locale)->sort($array) for PHP array sorting; use utf8mb4_unicode_ci collation in MySQL; specify COLLATE in ORDER BY when locale differs from table default
📦 Applies To
PHP 7.0+ web cli

✓ schema.org compliant