← Back to glossary

Apache Kafka Fundamentals

messaging PHP 7.0+ Advanced

Also Known As

Kafka Apache Kafka event streaming Kafka broker Kafka topic Kafka consumer group

TL;DR

A distributed event streaming platform that stores messages as an immutable ordered log partitioned across a cluster — optimised for high-throughput, durable, replayable event streams rather than traditional task queues.

Explanation

Kafka models data as a durable, ordered, partitioned log of events. Producers append events to topics; consumers read events at their own offset (position in the log), independently of other consumers. Unlike traditional queues where messages are deleted after consumption, Kafka retains messages for a configurable retention period — allowing multiple consumer groups to process the same events independently, and allowing replay from any past offset. Topics are divided into partitions (for parallelism and horizontal scaling) and replicated across brokers (for fault tolerance). The ordering guarantee is per-partition — within a partition, events are strictly ordered; across partitions, ordering is not guaranteed. In PHP, the most practical Kafka clients are php-rdkafka (wrapping the C librdkafka) and the Confluent PHP client. Kafka is operationally complex and best justified at high throughput or when event replay, multiple independent consumers, or event sourcing are requirements.

Common Misconception

✗ Kafka is just a faster RabbitMQ. Kafka and RabbitMQ have fundamentally different models. RabbitMQ is a traditional message broker — messages are routed, consumed once, and deleted. Kafka is an event log — events are retained, consumers track their own position, and the same event can be read by multiple independent consumer groups. Kafka excels at audit logs, event sourcing, real-time analytics, and stream processing. RabbitMQ excels at task queues, request routing, and work distribution. Choosing based on throughput alone misses the architectural differences.

Why It Matters

Kafka's event log model enables patterns impossible with traditional queues: replay past events to rebuild state, add a new service that processes all historical events from the beginning, audit every action in the system, and run real-time analytics in parallel with transaction processing. For PHP applications that need these capabilities — particularly event sourcing, CQRS, or data pipelines feeding analytics — Kafka is the right tool. Its operational complexity is significant, so it is worth justifying with a concrete requirement before adopting it over simpler alternatives.

Common Mistakes

Using a single partition for a topic — single partition means no parallelism; partition by a meaningful key (user ID, order ID) for balanced load.
Not committing offsets correctly — auto-commit can cause message loss on consumer crash; use manual offset commit after successful processing.
Treating Kafka like a queue where messages are deleted after consumption — Kafka retains messages regardless of consumption; manage retention periods explicitly.
Underestimating operational complexity — Kafka requires ZooKeeper (or KRaft in newer versions), cluster management, and monitoring; consider managed services (Confluent Cloud, AWS MSK) for PHP teams without dedicated infrastructure engineers.

Code Examples

✗ Vulnerable

// Single partition — no parallelism, ordering bottleneck
$producer->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($event));
// RD_KAFKA_PARTITION_UA = unassigned, Kafka picks one partition
// All events go to the same partition = single consumer thread

✓ Fixed

// Partition by entity key — parallelism + ordering per entity
$partition = crc32($event['order_id']) % $topic->getPartitionCount();
$producer->produce($partition, 0, json_encode($event), $event['order_id']);
// Events for the same order_id always go to the same partition
// = ordered per order, parallelised across orders

// Consumer with manual offset commit
while (true) {
    $msg = $consumer->consume(120000);
    if ($msg->err === RD_KAFKA_RESP_ERR_NO_ERROR) {
        processEvent($msg->payload);
        $consumer->commitAsync($msg); // commit only after processing
    }
}

Apache Kafka Fundamentals

Also Known As

TL;DR

Explanation

Common Misconception

Why It Matters

Common Mistakes

Code Examples

References

Tags

Apache Kafka Fundamentals

Also Known As

TL;DR

Explanation

Common Misconception

Why It Matters

Common Mistakes

Code Examples

References

Tags

Related Terms