← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Message Serialisation

messaging Intermediate

Also Known As

Avro Protobuf Protocol Buffers MessagePack message format serialisation format

TL;DR

JSON (readable, flexible), Avro (schema-enforced, compact, Kafka standard), Protobuf (typed, 5-10x smaller) — matching format to volume and schema requirements.

Explanation

Message serialisation formats: JSON — human-readable, schema-free, widely supported, but verbose (35 bytes for a simple event). MessagePack — binary JSON, 2-3x smaller. Avro — schema stored in a schema registry; consumer must have schema; excellent for Kafka; schema evolution rules prevent breaking consumers. Protocol Buffers (Protobuf) — strongly typed .proto schema, 5-10x smaller and faster than JSON, excellent cross-language support. Never use PHP serialize() for messages — PHP-only format and a deserialization vulnerability source.

Common Misconception

JSON is always sufficient for message serialisation — at high message volume (millions/day), Protobuf or Avro significantly reduce storage and network costs; at 10M messages/day, Protobuf vs JSON is a 10x storage and bandwidth difference.

Why It Matters

A Kafka topic receiving 10M messages per day as JSON uses 10x more storage and bandwidth than Protobuf — serialisation format is a significant operational cost driver at scale.

Common Mistakes

  • PHP serialize() for messages — PHP-only, insecure deserialization vulnerability
  • No schema validation for JSON messages — malformed messages corrupt consumer state silently
  • Avro without a schema registry — consumers need schemas; registry provides versioned schema access
  • Changing Protobuf field numbers — field numbers are permanent; changing breaks all existing consumers

Code Examples

✗ Vulnerable
// PHP serialize — PHP-only, deserialization risk:
$message = serialize(['order_id' => 42, 'amount' => 99.99]);
$queue->publish($message);
// Cannot consume from Node.js, Python, Go services
// Consumer must unserialize() — deserialization vulnerability
✓ Fixed
// JSON — simple, cross-language:
$message = json_encode(['order_id' => 42, 'amount' => 99.99]);

// Protobuf — typed, compact, cross-language:
// payment.proto: message PaymentEvent { int64 order_id = 1; double amount = 2; }
$event = new PaymentEvent();
$event->setOrderId(42);
$event->setAmount(99.99);
$binary = $event->serializeToString(); // ~10 bytes vs JSON ~35 bytes

Added 16 Mar 2026
Edited 22 Mar 2026
Views 16
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 0 pings M 1 ping T 0 pings W 2 pings T 1 ping F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 1 ping S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 1 ping S
No pings yesterday
Amazonbot 6 Perplexity 2 Unknown AI 2 Ahrefs 2 Google 2 ChatGPT 2
crawler 14 crawler_json 2
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Use JSON for simple queue messages and Protocol Buffers or Avro (with Schema Registry) for high-throughput Kafka — always include a schema version in your message envelope so consumers can evolve independently
📦 Applies To
any queue-worker
🔗 Prerequisites
🔍 Detection Hints
PHP serialise() used for queue messages; no schema versioning in message envelope; breaking changes to message format without consumer coordination
Auto-detectable: ✗ No kafka avro protobuf
⚠ Related Problems
🤖 AI Agent
Confidence: Medium False Positives: Medium ✗ Manual fix Fix: Medium Context: File Tests: Update
CWE-502

✓ schema.org compliant