← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Apache Kafka Fundamentals

Messaging PHP 7.0+ Advanced
debt(d9/e9/b9/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). No detection_hints tools are listed. Kafka misuse — wrong partition strategy, incorrect offset commits, mismatched retention assumptions — produces no compile-time or lint-time signals. Issues such as message loss from auto-commit or stalled consumers typically surface only under production load or after a consumer crash, well after deployment.

e9 Effort Remediation debt — work required to fix once spotted

Closest to 'architectural rework' (e9). The quick_fix touches partitioning strategy, offset commit mode, retention configuration, and infrastructure choices (managed vs self-hosted). These are not one-line fixes; adopting Kafka incorrectly (treating it like RabbitMQ, wrong partition key, wrong offset strategy) typically requires rethinking the event model, repartitioning topics, migrating consumers, and potentially replacing the infrastructure layer — a full architectural rework.

b9 Burden Structural debt — long-term weight of choosing wrong

Closest to 'defines the system's shape' (b9). Kafka applies to both web and cli contexts and carries extreme architectural weight. Once a system is built around Kafka's event log model — consumer groups, partition layouts, retention policies, offset management — every new service, data pipeline, and deployment decision is shaped by it. Kafka is explicitly described as having significant operational complexity, requiring cluster management and monitoring, making it a defining architectural commitment.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The canonical misconception is explicitly stated: developers familiar with RabbitMQ assume Kafka is 'just a faster RabbitMQ.' This is a serious architectural trap — messages are not deleted after consumption, consumers track their own offsets, auto-commit can silently cause message loss, and single-partition topics silently eliminate parallelism. These behaviors directly contradict RabbitMQ's model, making the trap severe for experienced queue users but not quite catastrophic (t9) since Kafka's model is well-documented.

About DEBT scoring →

Also Known As

Kafka Apache Kafka event streaming Kafka broker Kafka topic Kafka consumer group

TL;DR

A distributed event streaming platform that stores messages as an immutable ordered log partitioned across a cluster — optimised for high-throughput, durable, replayable event streams rather than traditional task queues.

Explanation

Kafka models data as a durable, ordered, partitioned log of events. Producers append events to topics; consumers read events at their own offset (position in the log), independently of other consumers. Unlike traditional queues where messages are deleted after consumption, Kafka retains messages for a configurable retention period — allowing multiple consumer groups to process the same events independently, and allowing replay from any past offset. Topics are divided into partitions (for parallelism and horizontal scaling) and replicated across brokers (for fault tolerance). The ordering guarantee is per-partition — within a partition, events are strictly ordered; across partitions, ordering is not guaranteed. In PHP, the most practical Kafka clients are php-rdkafka (wrapping the C librdkafka) and the Confluent PHP client. Kafka is operationally complex and best justified at high throughput or when event replay, multiple independent consumers, or event sourcing are requirements.

Common Misconception

Kafka is just a faster RabbitMQ. Kafka and RabbitMQ have fundamentally different models. RabbitMQ is a traditional message broker — messages are routed, consumed once, and deleted. Kafka is an event log — events are retained, consumers track their own position, and the same event can be read by multiple independent consumer groups. Kafka excels at audit logs, event sourcing, real-time analytics, and stream processing. RabbitMQ excels at task queues, request routing, and work distribution. Choosing based on throughput alone misses the architectural differences.

Why It Matters

Kafka's event log model enables patterns impossible with traditional queues: replay past events to rebuild state, add a new service that processes all historical events from the beginning, audit every action in the system, and run real-time analytics in parallel with transaction processing. For PHP applications that need these capabilities — particularly event sourcing, CQRS, or data pipelines feeding analytics — Kafka is the right tool. Its operational complexity is significant, so it is worth justifying with a concrete requirement before adopting it over simpler alternatives.

Common Mistakes

  • Using a single partition for a topic — single partition means no parallelism; partition by a meaningful key (user ID, order ID) for balanced load.
  • Not committing offsets correctly — auto-commit can cause message loss on consumer crash; use manual offset commit after successful processing.
  • Treating Kafka like a queue where messages are deleted after consumption — Kafka retains messages regardless of consumption; manage retention periods explicitly.
  • Underestimating operational complexity — Kafka requires ZooKeeper (or KRaft in newer versions), cluster management, and monitoring; consider managed services (Confluent Cloud, AWS MSK) for PHP teams without dedicated infrastructure engineers.

Code Examples

✗ Vulnerable
// Single partition — no parallelism, ordering bottleneck
$producer->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($event));
// RD_KAFKA_PARTITION_UA = unassigned, Kafka picks one partition
// All events go to the same partition = single consumer thread
✓ Fixed
// Partition by entity key — parallelism + ordering per entity
$partition = crc32($event['order_id']) % $topic->getPartitionCount();
$producer->produce($partition, 0, json_encode($event), $event['order_id']);
// Events for the same order_id always go to the same partition
// = ordered per order, parallelised across orders

// Consumer with manual offset commit
while (true) {
    $msg = $consumer->consume(120000);
    if ($msg->err === RD_KAFKA_RESP_ERR_NO_ERROR) {
        processEvent($msg->payload);
        $consumer->commitAsync($msg); // commit only after processing
    }
}

Added 23 Mar 2026
Edited 5 Apr 2026
Views 66
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 1 ping W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T 1 ping F 1 ping S 1 ping S 1 ping M 1 ping T 0 pings W 0 pings T 0 pings F 2 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 1 ping S 1 ping S 1 ping M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Amazonbot 19 Perplexity 11 Scrapy 5 Ahrefs 4 Google 3 SEMrush 3 Meta AI 2 Bing 2 ChatGPT 1 Majestic 1 Claude 1 PetalBot 1
crawler 52 crawler_json 1
DEV INTEL Tools & Severity
🔵 Info ⚙ Fix effort: High
⚡ Quick Fix
Partition by a stable entity key (user_id, order_id), commit offsets manually after processing, set retention.ms explicitly, use a managed Kafka service to avoid ZooKeeper management
📦 Applies To
PHP 7.0+ web cli


✓ schema.org compliant