Topics & Partitions
debt(d9/e7/b7/t7)
Closest to 'silent in production until users hit it' (d9). The detection_hints state automated=no, with only a code_pattern hint for partition|partition_key. Over-partitioning, missing replication, or wrong partition counts produce no compile/lint errors — problems only surface as consumer lag, rebalancing storms, or data loss in production.
Closest to 'cross-cutting refactor across the codebase' (e7). The why_it_matters field explicitly states partition count 'can't easily be reduced after creation.' Changing partition count requires updating consumers (a noted common_mistake), potentially re-keying producers, and coordinating replication changes — this spans multiple services and infrastructure config, not a single-file fix.
Closest to 'strong gravitational pull' (e7). Partition count is called 'the most important Kafka sizing decision' and applies to cli and queue-worker contexts. Every consumer group, ordering guarantee, and parallelism decision is shaped by initial partition topology. Wrong choices impose a persistent tax across all producers and consumers until a painful migration.
Closest to 'serious trap' (t7). The misconception field explicitly states 'More partitions is always better' — a belief that contradicts how most scaling knobs work (more = better). Increasing partitions also silently breaks key-based routing for existing consumers, a non-obvious cross-cutting side effect documented in common_mistakes.
TL;DR
Explanation
Topic: a named stream of records. Partition: an ordered, immutable log. Records appended with offset. Multiple partitions per topic: parallel reads/writes. Partition assignment: by key hash (same key → same partition → ordering), round-robin (no key → load balanced → no ordering), custom partitioner. Replication: each partition has one leader and N-1 followers. Leader handles reads/writes; followers replicate. Partition count: set at creation (can increase, never decrease). More partitions → more parallelism → more overhead (file handles, memory). Typical: 3-12 partitions for throughput, 1 for strict ordering.
Common Misconception
Why It Matters
Common Mistakes
- Too many partitions for the workload — overhead without benefit.
- Changing partition count without updating consumers — key routing changes.
- No replication in production — single partition leader = single point of failure.
Code Examples
# 1 partition — no parallelism:
kafka-topics --create --topic orders --partitions 1 --replication-factor 1
# 1000 partitions — unnecessary overhead
# Balanced: 6 partitions, 3 replicas:
kafka-topics --create --topic orders --partitions 6 --replication-factor 3
# PHP producer with partition key:
$conf->set('partitioner', 'consistent_random'); // Same key = same partition
$producer->produce(RD_KAFKA_PARTITION_UA, 0, $payload, $orderId); // Key = orderId