← Back to glossary

Message Serialization (Avro/Protobuf)

Messaging Intermediate

debt(d8/e7/b7/t7)

d8 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9), scored d8. The detection_hints say automated=no and the code_pattern is just json_encode/json_decode. No tooling catches the mismatch between JSON adequacy and high-throughput requirements, nor schema drift, nor reused Protobuf field numbers — these surface only as consumer crashes, silent decode errors, or performance degradation under load. Slightly below d9 because consumer crashes on wrong structure are at least visible failures (not completely silent), but the root cause remains obscure.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix says to switch to Protobuf for new pipelines, add a schema registry for Avro, fix field numbers, and validate on consumers. Migrating an existing high-throughput pipeline from JSON to Protobuf/Avro requires changing producers, consumers, schema definitions, deployment of a schema registry, and coordinating rolling upgrades — this is a cross-cutting change spanning multiple services and infrastructure components, landing firmly at e7.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). The choice of serialization format applies to both cli and queue-worker contexts and is deeply embedded in every message producer and consumer. Every new service that joins the pipeline must conform to the chosen schema and format. A schema registry adds operational overhead that shapes all future messaging work. The common mistakes (no schema validation, no registry, reused field numbers) compound over time as the pipeline grows, making this a b7 burden.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The misconception field directly states the canonical wrong belief: 'JSON is sufficient for all messaging.' Developers experienced with REST APIs and JSON naturally apply the same pattern to messaging pipelines, not anticipating that at millions of messages/sec the CPU and size costs become infrastructure-defining. Additionally, reusing Protobuf field numbers — a trap unique to this format — causes silent decoding errors in old consumers, which contradicts intuitions from other serialization systems. These compounding traps justify t7.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-10 · reviewed by human

TL;DR

Binary serialization formats (Avro, Protobuf, MessagePack) are faster and smaller than JSON for high-throughput messaging — with schema evolution support for Avro.

Explanation

JSON: human-readable, no schema, large. Avro: binary, schema required, schema registry for evolution. Protobuf: binary, .proto schema, excellent language support, smaller than Avro. MessagePack: binary JSON (no schema). Schema registry (Confluent): stores Avro/Protobuf schemas, enforces compatibility (backward/forward/full). Schema evolution: Avro backward compatible — add optional fields with defaults. Protobuf: add fields with new numbers, never reuse numbers. For PHP: use avro-php, google/protobuf PHP library. JSON is fine for low-volume; use binary for high-throughput Kafka pipelines.

Common Misconception

✗ JSON is sufficient for all messaging — for high-throughput Kafka pipelines (millions/sec), JSON parsing CPU and size become significant. Protobuf is 3-10x smaller and 5-10x faster to parse.

Why It Matters

At high throughput, serialization format determines CPU cost and network bandwidth — binary formats can cut infrastructure costs significantly.

Common Mistakes

No schema validation on JSON messages — producer sends wrong structure, consumer crashes.
Not using a schema registry — schema changes break consumers silently.
Reusing Protobuf field numbers — causes decoding errors in old consumers.

Code Examples

✗ Vulnerable

// JSON — verbose, no schema enforcement:
$producer->send(json_encode(['userId' => 1, 'amount' => '10.00'])); // Is amount int or string?

✓ Fixed

// Protobuf — typed, compact:
$msg = new OrderCreated();
$msg->setUserId(1);
$msg->setAmountCents(1000); // Explicit type
$producer->send($msg->serializeToString());

// Consumer:
$event = new OrderCreated();
$event->mergeFromString($rawMessage);

Message Serialization (Avro/Protobuf)

TL;DR

Explanation

Common Misconception

Why It Matters

Common Mistakes

Code Examples

Tags

References

Message Serialization (Avro/Protobuf)

TL;DR

Explanation

Common Misconception

Why It Matters

Common Mistakes

Code Examples

Tags

Related Terms

References