Apache Kafka Fundamentals
Also Known As
TL;DR
Explanation
Kafka models data as a durable, ordered, partitioned log of events. Producers append events to topics; consumers read events at their own offset (position in the log), independently of other consumers. Unlike traditional queues where messages are deleted after consumption, Kafka retains messages for a configurable retention period — allowing multiple consumer groups to process the same events independently, and allowing replay from any past offset. Topics are divided into partitions (for parallelism and horizontal scaling) and replicated across brokers (for fault tolerance). The ordering guarantee is per-partition — within a partition, events are strictly ordered; across partitions, ordering is not guaranteed. In PHP, the most practical Kafka clients are php-rdkafka (wrapping the C librdkafka) and the Confluent PHP client. Kafka is operationally complex and best justified at high throughput or when event replay, multiple independent consumers, or event sourcing are requirements.
Common Misconception
Why It Matters
Common Mistakes
- Using a single partition for a topic — single partition means no parallelism; partition by a meaningful key (user ID, order ID) for balanced load.
- Not committing offsets correctly — auto-commit can cause message loss on consumer crash; use manual offset commit after successful processing.
- Treating Kafka like a queue where messages are deleted after consumption — Kafka retains messages regardless of consumption; manage retention periods explicitly.
- Underestimating operational complexity — Kafka requires ZooKeeper (or KRaft in newer versions), cluster management, and monitoring; consider managed services (Confluent Cloud, AWS MSK) for PHP teams without dedicated infrastructure engineers.
Code Examples
// Single partition — no parallelism, ordering bottleneck
$producer->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($event));
// RD_KAFKA_PARTITION_UA = unassigned, Kafka picks one partition
// All events go to the same partition = single consumer thread
// Partition by entity key — parallelism + ordering per entity
$partition = crc32($event['order_id']) % $topic->getPartitionCount();
$producer->produce($partition, 0, json_encode($event), $event['order_id']);
// Events for the same order_id always go to the same partition
// = ordered per order, parallelised across orders
// Consumer with manual offset commit
while (true) {
$msg = $consumer->consume(120000);
if ($msg->err === RD_KAFKA_RESP_ERR_NO_ERROR) {
processEvent($msg->payload);
$consumer->commitAsync($msg); // commit only after processing
}
}