When should you NOT use Queue-Based Load Levelling?

The consumer cannot keep up with the queue — messages accumulate indefinitely and the queue becomes a backlog. The operation requires an immediate response — queues are asynchronous; the caller cannot wait for the result. Message ordering is critical and the queue does not guarantee it — use a single-consumer queue or ordered stream. The payload is too large for the queue — store large payloads externally and queue only the reference.

When is Queue-Based Load Levelling the right choice?

Smoothing traffic spikes — absorb burst writes and process them at a sustainable rate. Decoupling producers from consumers so either can scale or fail independently. Deferring expensive work (email, PDF, video transcoding) out of the HTTP request cycle. Retry and dead-letter handling — queues provide built-in retry semantics without application code.

← Back to glossary

Queue-Based Load Levelling

messaging Intermediate

debt(d7/e7/b7/t5)

d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The absence of a queue between mismatched-throughput systems isn't caught by static tools; detection_hints lists Horizon/Datadog/CloudWatch which surface symptoms (queue depth, latency) only at runtime under load.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). Quick_fix says 'put a queue between two systems' — sounds simple, but introducing async processing means moving the work into jobs, handling retries/DLQ, updating callers to not wait, and adapting UX flows; touches many call sites.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). Applies to web and queue-worker contexts; once introduced, every related feature must consider sync-vs-async, idempotency, and queue depth. Shapes how new work is dispatched throughout the system.

t5 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'notable trap most devs eventually learn' (t5). Misconception flagged: developers see added latency and dismiss the pattern, missing that the benefit is survival under spike rather than speed — a common gotcha but not catastrophically counterintuitive.

About DEBT scoring → scored by claude-opus-4-7 · 2026-05-11 · reviewed by human

Also Known As

load levelling queue buffer traffic smoothing

TL;DR

Using a queue between producers and consumers to absorb traffic spikes — producers enqueue at any rate, consumers process at a sustainable rate, preventing the backend from being overwhelmed.

Explanation

Load levelling decouples production rate from consumption rate. A traffic spike hits the queue (which can absorb millions of messages) rather than directly hitting the database or downstream service. The queue acts as a buffer — workers consume at a steady rate the backend can handle. This pattern is fundamental to resilient architectures: image processing queues, email sending queues, payment processing queues, and report generation queues all implement load levelling. Combined with backpressure and dead letter queues for robust failure handling.

Common Misconception

✗ Load levelling adds latency without benefit — latency is the explicit trade-off; the benefit is that the system processes every request successfully rather than dropping some under spike load.

Why It Matters

A checkout spike of 10,000 simultaneous orders directly hitting a PDF invoice generator would crash it — a queue absorbs the spike, processes invoices steadily at 100/sec, and every invoice is eventually generated.

Common Mistakes

No queue depth monitoring — a growing queue signals consumers cannot keep up.
Consumer pool too small for normal load — queue perpetually grows instead of draining.
No dead letter queue for failed jobs — failed messages lost silently.
Synchronous operations where the queue should be used — user waits for what should be async.

Avoid When

The consumer cannot keep up with the queue — messages accumulate indefinitely and the queue becomes a backlog.
The operation requires an immediate response — queues are asynchronous; the caller cannot wait for the result.
Message ordering is critical and the queue does not guarantee it — use a single-consumer queue or ordered stream.
The payload is too large for the queue — store large payloads externally and queue only the reference.

When To Use

Smoothing traffic spikes — absorb burst writes and process them at a sustainable rate.
Decoupling producers from consumers so either can scale or fail independently.
Deferring expensive work (email, PDF, video transcoding) out of the HTTP request cycle.
Retry and dead-letter handling — queues provide built-in retry semantics without application code.

Code Examples

✗ Vulnerable

// Direct processing — 10,000 simultaneous requests overwhelm the generator:
class InvoiceController {
    public function generate(Order $order): Response {
        $pdf = $this->pdfGenerator->generate($order); // CPU-intensive, slow
        return response($pdf, 200, ['Content-Type' => 'application/pdf']);
        // 10,000 concurrent: PHP-FPM workers exhausted, timeouts
    }
}

✓ Fixed

// Queue-based load levelling:
class InvoiceController {
    public function request(Order $order): Response {
        GenerateInvoice::dispatch($order->id); // Queue — returns immediately
        return response()->json(['status' => 'Invoice being generated, check email']);
    }
}

// Queue worker — runs at sustainable rate:
class GenerateInvoiceJob {
    public function handle(): void {
        $pdf = $this->pdfGenerator->generate($this->order);
        $this->emailService->sendInvoice($this->order, $pdf);
        // 10,000 orders → all processed, none dropped, at 100/sec
    }
}

References

↗ https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling