← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

MySQL charset=utf8mb4

php PHP 5.1+ Beginner

Also Known As

utf8mb4 MySQL UTF-8 emoji MySQL 4-byte unicode MySQL

TL;DR

The correct MySQL character set for full Unicode support — including emoji and supplementary characters that the older utf8 charset cannot store.

Explanation

MySQL's 'utf8' charset is a 3-byte encoding that cannot store 4-byte Unicode code points (emoji, some CJK characters, mathematical symbols). 'utf8mb4' is the correct implementation of UTF-8 and supports the full Unicode range. Using 'utf8' causes silent data truncation or errors when 4-byte characters are inserted. The DSN should specify charset=utf8mb4 and the column/table/database collation should be utf8mb4_unicode_ci or utf8mb4_0900_ai_ci (MySQL 8+).

Watch Out

MySQL's 'utf8' charset is NOT real UTF-8 — it is a 3-byte subset. Only 'utf8mb4' is full UTF-8.

Common Misconception

MySQL's utf8 charset is the same as UTF-8. It is not — MySQL utf8 is a 3-byte subset. Only utf8mb4 is true UTF-8.

Why It Matters

Storing emoji, multilingual content, or any 4-byte Unicode character in a utf8 column either silently truncates the string or throws an error — data loss without any warning in strict mode off.

Common Mistakes

  • Specifying charset=utf8 in the DSN — silent truncation of emoji and supplementary characters.
  • Mixing utf8 and utf8mb4 columns in the same table — comparison and join operations may have unexpected collation errors.
  • Forgetting to set utf8mb4 at the connection level even when the table columns are utf8mb4.

Avoid When

  • Do not use utf8 — it silently truncates or errors on emoji and supplementary Unicode characters.

When To Use

  • Always use utf8mb4 for any table that may store user-generated content, names, or multilingual text.
  • Set charset=utf8mb4 in the DSN — not via SET NAMES — to ensure it applies at the protocol level.

Code Examples

✗ Vulnerable
// Wrong: utf8 truncates emoji silently
$pdo = new PDO('mysql:host=localhost;dbname=app;charset=utf8', $user, $pass);
// INSERT 'Hello 😀' → stored as 'Hello ' (emoji silently dropped)
✓ Fixed
// Correct: utf8mb4 in DSN and SET NAMES
$pdo = new PDO('mysql:host=localhost;dbname=app;charset=utf8mb4', $user, $pass);

-- SQL: table with correct charset
CREATE TABLE posts (
    id   INT AUTO_INCREMENT PRIMARY KEY,
    body TEXT
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Added 31 Mar 2026
Views 27
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 1 ping M 2 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T
No pings yet today
No pings yesterday
Perplexity 7 Google 4 Unknown AI 2 Meta AI 1 ChatGPT 1 Ahrefs 1
crawler 16
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Low
⚡ Quick Fix
Use charset=utf8mb4 in the DSN and ALTER TABLE columns to utf8mb4_unicode_ci collation
📦 Applies To
PHP 5.1+ web cli
🔗 Prerequisites
🔍 Detection Hints
charset=utf8 in DSN or SET NAMES utf8 without mb4
Auto-detectable: ✓ Yes semgrep
⚠ Related Problems
🤖 AI Agent
Confidence: High False Positives: Low ✓ Auto-fixable Fix: Low Context: Line

✓ schema.org compliant