Message Serialization (Avro/Protobuf)
debt(d8/e7/b7/t7)
Closest to 'silent in production until users hit it' (d9), scored d8. The detection_hints say automated=no and the code_pattern is just json_encode/json_decode. No tooling catches the mismatch between JSON adequacy and high-throughput requirements, nor schema drift, nor reused Protobuf field numbers — these surface only as consumer crashes, silent decode errors, or performance degradation under load. Slightly below d9 because consumer crashes on wrong structure are at least visible failures (not completely silent), but the root cause remains obscure.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix says to switch to Protobuf for new pipelines, add a schema registry for Avro, fix field numbers, and validate on consumers. Migrating an existing high-throughput pipeline from JSON to Protobuf/Avro requires changing producers, consumers, schema definitions, deployment of a schema registry, and coordinating rolling upgrades — this is a cross-cutting change spanning multiple services and infrastructure components, landing firmly at e7.
Closest to 'strong gravitational pull' (b7). The choice of serialization format applies to both cli and queue-worker contexts and is deeply embedded in every message producer and consumer. Every new service that joins the pipeline must conform to the chosen schema and format. A schema registry adds operational overhead that shapes all future messaging work. The common mistakes (no schema validation, no registry, reused field numbers) compound over time as the pipeline grows, making this a b7 burden.
Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The misconception field directly states the canonical wrong belief: 'JSON is sufficient for all messaging.' Developers experienced with REST APIs and JSON naturally apply the same pattern to messaging pipelines, not anticipating that at millions of messages/sec the CPU and size costs become infrastructure-defining. Additionally, reusing Protobuf field numbers — a trap unique to this format — causes silent decoding errors in old consumers, which contradicts intuitions from other serialization systems. These compounding traps justify t7.
TL;DR
Explanation
JSON: human-readable, no schema, large. Avro: binary, schema required, schema registry for evolution. Protobuf: binary, .proto schema, excellent language support, smaller than Avro. MessagePack: binary JSON (no schema). Schema registry (Confluent): stores Avro/Protobuf schemas, enforces compatibility (backward/forward/full). Schema evolution: Avro backward compatible — add optional fields with defaults. Protobuf: add fields with new numbers, never reuse numbers. For PHP: use avro-php, google/protobuf PHP library. JSON is fine for low-volume; use binary for high-throughput Kafka pipelines.
Common Misconception
Why It Matters
Common Mistakes
- No schema validation on JSON messages — producer sends wrong structure, consumer crashes.
- Not using a schema registry — schema changes break consumers silently.
- Reusing Protobuf field numbers — causes decoding errors in old consumers.
Code Examples
// JSON — verbose, no schema enforcement:
$producer->send(json_encode(['userId' => 1, 'amount' => '10.00'])); // Is amount int or string?
// Protobuf — typed, compact:
$msg = new OrderCreated();
$msg->setUserId(1);
$msg->setAmountCents(1000); // Explicit type
$producer->send($msg->serializeToString());
// Consumer:
$event = new OrderCreated();
$event->mergeFromString($rawMessage);