← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

LLM Streaming Responses

ai_ml PHP 8.0+ Intermediate

Also Known As

LLM streaming streaming tokens SSE AI streaming PHP LLM

TL;DR

Receiving LLM output token-by-token as it is generated rather than waiting for the full response — dramatically improving perceived latency for users and enabling real-time displays of AI-generated content.

Explanation

Without streaming, a user waits 5–15 seconds staring at a spinner before any text appears. With streaming, text appears within ~200ms and continues flowing. On the server side, LLM APIs return server-sent events or chunked transfer encoding. The PHP SDK receives delta objects with partial text. The PHP backend must forward these to the browser either via SSE (text/event-stream) or by buffering and returning the full response. For real-time display, SSE from PHP to browser is the standard pattern. Each delta contains a small text fragment; accumulate them server-side for logging while streaming each to the client. Stop reasons (end_turn, max_tokens, tool_use) indicate why generation stopped.

Common Misconception

Streaming requires WebSockets. SSE (Server-Sent Events) over standard HTTP is sufficient and simpler — no WebSocket server needed, works through standard proxies, and the browser EventSource API handles reconnection automatically.

Why It Matters

User perception of AI quality is heavily influenced by response speed. A streaming response that starts in 200ms feels responsive even if total generation takes 10 seconds. Non-streaming responses feel broken or slow regardless of actual quality. For any customer-facing AI feature, streaming is not optional — it is expected.

Common Mistakes

  • Forgetting ob_flush() — output buffering in PHP (and Nginx) swallows chunks; you must call both ob_flush() and flush() after each SSE event.
  • Not setting X-Accel-Buffering: no — Nginx buffers proxy responses by default; this header disables it for the streaming endpoint.
  • Not handling the stream's stop reason — always check for tool_use or max_tokens stop reasons to know whether the response was truncated.
  • Logging the full response without accumulating from deltas — each SSE event contains only a fragment; accumulate all deltas server-side if you need to log the complete response.

Code Examples

✗ Vulnerable
<?php
// ❌ Buffering full response before sending — user waits 10+ seconds
Route::post('/chat', function (Request $request) {
    $response = Anthropic::messages()->create([
        'model'      => 'claude-sonnet-4-20250514',
        'max_tokens' => 1000,
        'messages'   => [['role' => 'user', 'content' => $request->message]],
    ]);
    return response()->json(['text' => $response->content[0]->text]);
    // User sees nothing for 8 seconds then gets the full text at once
});
✓ Fixed
<?php
// ✅ Streaming SSE response from PHP
Route::post('/chat', function (Request $request) {
    return response()->stream(function () use ($request) {
        header('Content-Type: text/event-stream');
        header('X-Accel-Buffering: no'); // Disable Nginx buffering
        header('Cache-Control: no-cache');

        $stream = Anthropic::messages()->stream([
            'model'      => 'claude-sonnet-4-20250514',
            'max_tokens' => 1000,
            'messages'   => [['role' => 'user', 'content' => $request->message]],
        ]);

        foreach ($stream as $event) {
            if ($event->type === 'content_block_delta') {
                $delta = $event->delta->text ?? '';
                echo 'data: ' . json_encode(['delta' => $delta]) . "\n\n";
                ob_flush();
                flush(); // Send chunk immediately
            }
        }

        echo 'data: ' . json_encode(['done' => true]) . "\n\n";
        ob_flush(); flush();
    }, 200, ['Content-Type' => 'text/event-stream']);
});

// Client: new EventSource('/chat') + listen for delta events

Added 23 Mar 2026
Views 34
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 0 pings T 2 pings F 0 pings S 1 ping S 0 pings M 0 pings T 0 pings W 0 pings T 2 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T 4 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 1 ping S
Amazonbot 1
Amazonbot 1
Amazonbot 14 ChatGPT 3 Google 3 Perplexity 3 Meta AI 2 Ahrefs 2 Qwen 1
crawler 26 crawler_json 2
DEV INTEL Tools & Severity
⚙ Fix effort: Medium
⚡ Quick Fix
Set header('Content-Type: text/event-stream') and header('X-Accel-Buffering: no') in PHP, then echo 'data: ' . json_encode(['delta' => $chunk]) . "\n\n" for each token, calling ob_flush() + flush() after each.
📦 Applies To
PHP 8.0+ web laravel symfony

✓ schema.org compliant