Apache Kafka Fundamentals
debt(d9/e9/b9/t7)
Closest to 'silent in production until users hit it' (d9). No detection_hints tools are listed. Kafka misuse — wrong partition strategy, incorrect offset commits, mismatched retention assumptions — produces no compile-time or lint-time signals. Issues such as message loss from auto-commit or stalled consumers typically surface only under production load or after a consumer crash, well after deployment.
Closest to 'architectural rework' (e9). The quick_fix touches partitioning strategy, offset commit mode, retention configuration, and infrastructure choices (managed vs self-hosted). These are not one-line fixes; adopting Kafka incorrectly (treating it like RabbitMQ, wrong partition key, wrong offset strategy) typically requires rethinking the event model, repartitioning topics, migrating consumers, and potentially replacing the infrastructure layer — a full architectural rework.
Closest to 'defines the system's shape' (b9). Kafka applies to both web and cli contexts and carries extreme architectural weight. Once a system is built around Kafka's event log model — consumer groups, partition layouts, retention policies, offset management — every new service, data pipeline, and deployment decision is shaped by it. Kafka is explicitly described as having significant operational complexity, requiring cluster management and monitoring, making it a defining architectural commitment.
Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The canonical misconception is explicitly stated: developers familiar with RabbitMQ assume Kafka is 'just a faster RabbitMQ.' This is a serious architectural trap — messages are not deleted after consumption, consumers track their own offsets, auto-commit can silently cause message loss, and single-partition topics silently eliminate parallelism. These behaviors directly contradict RabbitMQ's model, making the trap severe for experienced queue users but not quite catastrophic (t9) since Kafka's model is well-documented.
Also Known As
TL;DR
Explanation
Kafka models data as a durable, ordered, partitioned log of events. Producers append events to topics; consumers read events at their own offset (position in the log), independently of other consumers. Unlike traditional queues where messages are deleted after consumption, Kafka retains messages for a configurable retention period — allowing multiple consumer groups to process the same events independently, and allowing replay from any past offset. Topics are divided into partitions (for parallelism and horizontal scaling) and replicated across brokers (for fault tolerance). The ordering guarantee is per-partition — within a partition, events are strictly ordered; across partitions, ordering is not guaranteed. In PHP, the most practical Kafka clients are php-rdkafka (wrapping the C librdkafka) and the Confluent PHP client. Kafka is operationally complex and best justified at high throughput or when event replay, multiple independent consumers, or event sourcing are requirements.
Common Misconception
Why It Matters
Common Mistakes
- Using a single partition for a topic — single partition means no parallelism; partition by a meaningful key (user ID, order ID) for balanced load.
- Not committing offsets correctly — auto-commit can cause message loss on consumer crash; use manual offset commit after successful processing.
- Treating Kafka like a queue where messages are deleted after consumption — Kafka retains messages regardless of consumption; manage retention periods explicitly.
- Underestimating operational complexity — Kafka requires ZooKeeper (or KRaft in newer versions), cluster management, and monitoring; consider managed services (Confluent Cloud, AWS MSK) for PHP teams without dedicated infrastructure engineers.
Code Examples
// Single partition — no parallelism, ordering bottleneck
$producer->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($event));
// RD_KAFKA_PARTITION_UA = unassigned, Kafka picks one partition
// All events go to the same partition = single consumer thread
// Partition by entity key — parallelism + ordering per entity
$partition = crc32($event['order_id']) % $topic->getPartitionCount();
$producer->produce($partition, 0, json_encode($event), $event['order_id']);
// Events for the same order_id always go to the same partition
// = ordered per order, parallelised across orders
// Consumer with manual offset commit
while (true) {
$msg = $consumer->consume(120000);
if ($msg->err === RD_KAFKA_RESP_ERR_NO_ERROR) {
processEvent($msg->payload);
$consumer->commitAsync($msg); // commit only after processing
}
}