Dead Letter Queue
debt(d7/e5/b5/t5)
Closest to 'only careful code review or runtime testing' (d7). The term's detection_hints show automated=no, meaning there's no automated tooling that catches missing DLQ configuration. The code pattern 'dead.letter|dlq|failed_jobs' can grep for existing DLQs but cannot detect their absence. Missing DLQ config is typically discovered through code review of queue setup or runtime testing when messages start failing silently in production.
Closest to 'touches multiple files / significant refactor in one component' (e5). While the quick_fix suggests individual steps (configure DLQ, set up alerts, build replay tool), implementing this properly requires: configuring the queue infrastructure, setting up monitoring/alerting for DLQ depth, and building a replay mechanism. This spans queue configuration, monitoring setup, and potentially new tooling — not a one-line fix but also not architectural rework.
Closest to 'persistent productivity tax' (b5). The term applies to cli and queue-worker contexts, meaning every queue-based workflow needs DLQ consideration. Once configured, DLQ requires ongoing operational attention: monitoring depth, investigating failures, replaying messages. It's not load-bearing across the entire system (b7), but it creates persistent operational overhead for all queue-based components.
Closest to 'notable trap - documented gotcha most devs eventually learn' (t5). The misconception states 'DLQ is a last resort — it should be the first line of defence.' This directly contradicts the intuition that DLQ is for edge cases. Common mistakes include not configuring before production and not alerting — both stem from underestimating DLQ's importance. Developers familiar with queues eventually learn this, but newcomers consistently get it wrong.
TL;DR
Explanation
Messages go to DLQ when: max retries exceeded, message TTL expired, consumer explicitly rejects (nack/dead-letter). DLQ is a separate queue or topic. Benefits: failed messages don't block the main queue, enables manual inspection and replay, alerts on DLQ growth indicate consumer bugs. Implementations: RabbitMQ dead-letter-exchange + dead-letter-routing-key, Kafka DLQ topic (manual — consumer sends to DLQ on unhandled exception), SQS redrive policy (maxReceiveCount + DLQ ARN), Laravel queue failed_jobs table. Replay: fix the bug, then re-enqueue from DLQ. Monitor DLQ depth as a key operational metric.
Common Misconception
Why It Matters
Common Mistakes
- Not configuring DLQ before going to production.
- Not alerting on DLQ depth > 0 — failed messages go unnoticed.
- Not building a replay mechanism — DLQ is useless if you can't re-process.
Code Examples
// RabbitMQ queue without DLQ:
$channel->queue_declare('orders'); // Failed messages disappear
// With dead letter exchange:
$channel->exchange_declare('orders.dlx', 'fanout', false, true);
$channel->queue_declare('orders.dead', false, true);
$channel->queue_bind('orders.dead', 'orders.dlx');
$channel->queue_declare('orders', false, true, false, false, false, new AMQPTable([
'x-dead-letter-exchange' => 'orders.dlx',
'x-message-ttl' => 3600000, // 1h
'x-max-length' => 10000,
]));
// Alert when orders.dead queue depth > 0