← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

LLM Streaming Responses

AI / ML PHP 8.0+ Intermediate
debt(d7/e3/b3/t7)
d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). No detection tools are specified in detection_hints. Misuse—such as missing ob_flush(), missing X-Accel-Buffering header, or not handling stop reasons—will not be caught by a compiler, linter, or standard SAST tool. These failures only surface at runtime: chunks are silently buffered and the client sees nothing until the full response completes, or responses are silently truncated. A developer must test end-to-end with real network conditions or careful inspection to discover the problem.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). The quick_fix confirms the fix is a handful of header calls and a flush pattern after each token. It's more than a single-line patch because it requires coordinating Content-Type header, X-Accel-Buffering header, ob_flush() + flush() calls, and correct SSE event formatting—but it's confined to a single endpoint/component with no cross-cutting refactor needed.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3). The streaming pattern applies only to web contexts (applies_to: web) and is scoped to specific AI-response endpoints. It introduces a persistent but contained tax: each streaming endpoint must maintain the flush discipline and header setup. The rest of the codebase is unaffected, making this a localised rather than system-wide burden.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field identifies the primary trap explicitly: developers assume WebSockets are required and miss that SSE over plain HTTP is sufficient and simpler. Additionally, the common_mistakes list reveals multiple secondary traps that contradict reasonable assumptions—PHP/Nginx output buffering silently swallows chunks (the 'obvious' streaming code appears to work but delivers nothing), and proxy buffering requires a non-obvious vendor-specific header. These contradictions are serious because the failure mode is silent and the assumed solution (WebSockets) is more complex than necessary.

About DEBT scoring →

Also Known As

LLM streaming streaming tokens SSE AI streaming PHP LLM

TL;DR

Receiving LLM output token-by-token as it is generated rather than waiting for the full response — dramatically improving perceived latency for users and enabling real-time displays of AI-generated content.

Explanation

Without streaming, a user waits 5–15 seconds staring at a spinner before any text appears. With streaming, text appears within ~200ms and continues flowing. On the server side, LLM APIs return server-sent events or chunked transfer encoding. The PHP SDK receives delta objects with partial text. The PHP backend must forward these to the browser either via SSE (text/event-stream) or by buffering and returning the full response. For real-time display, SSE from PHP to browser is the standard pattern. Each delta contains a small text fragment; accumulate them server-side for logging while streaming each to the client. Stop reasons (end_turn, max_tokens, tool_use) indicate why generation stopped.

Common Misconception

Streaming requires WebSockets. SSE (Server-Sent Events) over standard HTTP is sufficient and simpler — no WebSocket server needed, works through standard proxies, and the browser EventSource API handles reconnection automatically.

Why It Matters

User perception of AI quality is heavily influenced by response speed. A streaming response that starts in 200ms feels responsive even if total generation takes 10 seconds. Non-streaming responses feel broken or slow regardless of actual quality. For any customer-facing AI feature, streaming is not optional — it is expected.

Common Mistakes

  • Forgetting ob_flush() — output buffering in PHP (and Nginx) swallows chunks; you must call both ob_flush() and flush() after each SSE event.
  • Not setting X-Accel-Buffering: no — Nginx buffers proxy responses by default; this header disables it for the streaming endpoint.
  • Not handling the stream's stop reason — always check for tool_use or max_tokens stop reasons to know whether the response was truncated.
  • Logging the full response without accumulating from deltas — each SSE event contains only a fragment; accumulate all deltas server-side if you need to log the complete response.

Code Examples

✗ Vulnerable
<?php
// ❌ Buffering full response before sending — user waits 10+ seconds
Route::post('/chat', function (Request $request) {
    $response = Anthropic::messages()->create([
        'model'      => 'claude-sonnet-4-20250514',
        'max_tokens' => 1000,
        'messages'   => [['role' => 'user', 'content' => $request->message]],
    ]);
    return response()->json(['text' => $response->content[0]->text]);
    // User sees nothing for 8 seconds then gets the full text at once
});
✓ Fixed
<?php
// ✅ Streaming SSE response from PHP
Route::post('/chat', function (Request $request) {
    return response()->stream(function () use ($request) {
        header('Content-Type: text/event-stream');
        header('X-Accel-Buffering: no'); // Disable Nginx buffering
        header('Cache-Control: no-cache');

        $stream = Anthropic::messages()->stream([
            'model'      => 'claude-sonnet-4-20250514',
            'max_tokens' => 1000,
            'messages'   => [['role' => 'user', 'content' => $request->message]],
        ]);

        foreach ($stream as $event) {
            if ($event->type === 'content_block_delta') {
                $delta = $event->delta->text ?? '';
                echo 'data: ' . json_encode(['delta' => $delta]) . "\n\n";
                ob_flush();
                flush(); // Send chunk immediately
            }
        }

        echo 'data: ' . json_encode(['done' => true]) . "\n\n";
        ob_flush(); flush();
    }, 200, ['Content-Type' => 'text/event-stream']);
});

// Client: new EventSource('/chat') + listen for delta events

Added 23 Mar 2026
Views 63
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W 2 pings T 2 pings F 2 pings S 1 ping S 2 pings M 0 pings T 0 pings W 1 ping T 1 ping F 1 ping S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 0 pings F 2 pings S 0 pings S 1 ping M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Amazonbot 14 Scrapy 7 ChatGPT 5 Google 5 Ahrefs 4 Meta AI 3 Perplexity 3 Bing 3 SEMrush 2 PetalBot 2 Qwen 1 Claude 1 Majestic 1
crawler 46 crawler_json 5
DEV INTEL Tools & Severity
⚙ Fix effort: Medium
⚡ Quick Fix
Set header('Content-Type: text/event-stream') and header('X-Accel-Buffering: no') in PHP, then echo 'data: ' . json_encode(['delta' => $chunk]) . "\n\n" for each token, calling ob_flush() + flush() after each.
📦 Applies To
PHP 8.0+ web laravel symfony


✓ schema.org compliant