Message Serialisation
Also Known As
Avro
Protobuf
Protocol Buffers
MessagePack
message format
serialisation format
TL;DR
JSON (readable, flexible), Avro (schema-enforced, compact, Kafka standard), Protobuf (typed, 5-10x smaller) — matching format to volume and schema requirements.
Explanation
Message serialisation formats: JSON — human-readable, schema-free, widely supported, but verbose (35 bytes for a simple event). MessagePack — binary JSON, 2-3x smaller. Avro — schema stored in a schema registry; consumer must have schema; excellent for Kafka; schema evolution rules prevent breaking consumers. Protocol Buffers (Protobuf) — strongly typed .proto schema, 5-10x smaller and faster than JSON, excellent cross-language support. Never use PHP serialize() for messages — PHP-only format and a deserialization vulnerability source.
Common Misconception
✗ JSON is always sufficient for message serialisation — at high message volume (millions/day), Protobuf or Avro significantly reduce storage and network costs; at 10M messages/day, Protobuf vs JSON is a 10x storage and bandwidth difference.
Why It Matters
A Kafka topic receiving 10M messages per day as JSON uses 10x more storage and bandwidth than Protobuf — serialisation format is a significant operational cost driver at scale.
Common Mistakes
- PHP serialize() for messages — PHP-only, insecure deserialization vulnerability
- No schema validation for JSON messages — malformed messages corrupt consumer state silently
- Avro without a schema registry — consumers need schemas; registry provides versioned schema access
- Changing Protobuf field numbers — field numbers are permanent; changing breaks all existing consumers
Code Examples
✗ Vulnerable
// PHP serialize — PHP-only, deserialization risk:
$message = serialize(['order_id' => 42, 'amount' => 99.99]);
$queue->publish($message);
// Cannot consume from Node.js, Python, Go services
// Consumer must unserialize() — deserialization vulnerability
✓ Fixed
// JSON — simple, cross-language:
$message = json_encode(['order_id' => 42, 'amount' => 99.99]);
// Protobuf — typed, compact, cross-language:
// payment.proto: message PaymentEvent { int64 order_id = 1; double amount = 2; }
$event = new PaymentEvent();
$event->setOrderId(42);
$event->setAmount(99.99);
$binary = $event->serializeToString(); // ~10 bytes vs JSON ~35 bytes
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
16 Mar 2026
Edited
22 Mar 2026
Views
16
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
No pings yesterday
Amazonbot 6
Perplexity 2
Unknown AI 2
Ahrefs 2
Google 2
ChatGPT 2
Also referenced
How they use it
crawler 14
crawler_json 2
Related categories
⚡
DEV INTEL
Tools & Severity
🟡 Medium
⚙ Fix effort: Medium
⚡ Quick Fix
Use JSON for simple queue messages and Protocol Buffers or Avro (with Schema Registry) for high-throughput Kafka — always include a schema version in your message envelope so consumers can evolve independently
📦 Applies To
any
queue-worker
🔗 Prerequisites
🔍 Detection Hints
PHP serialise() used for queue messages; no schema versioning in message envelope; breaking changes to message format without consumer coordination
Auto-detectable:
✗ No
kafka
avro
protobuf
⚠ Related Problems
🤖 AI Agent
Confidence: Medium
False Positives: Medium
✗ Manual fix
Fix: Medium
Context: File
Tests: Update
CWE-502