← Back to glossary

LLM Streaming Responses

ai_ml PHP 8.0+ Intermediate

Also Known As

LLM streaming streaming tokens SSE AI streaming PHP LLM

TL;DR

Receiving LLM output token-by-token as it is generated rather than waiting for the full response — dramatically improving perceived latency for users and enabling real-time displays of AI-generated content.

Explanation

Without streaming, a user waits 5–15 seconds staring at a spinner before any text appears. With streaming, text appears within ~200ms and continues flowing. On the server side, LLM APIs return server-sent events or chunked transfer encoding. The PHP SDK receives delta objects with partial text. The PHP backend must forward these to the browser either via SSE (text/event-stream) or by buffering and returning the full response. For real-time display, SSE from PHP to browser is the standard pattern. Each delta contains a small text fragment; accumulate them server-side for logging while streaming each to the client. Stop reasons (end_turn, max_tokens, tool_use) indicate why generation stopped.

Common Misconception

✗ Streaming requires WebSockets. SSE (Server-Sent Events) over standard HTTP is sufficient and simpler — no WebSocket server needed, works through standard proxies, and the browser EventSource API handles reconnection automatically.

Why It Matters

User perception of AI quality is heavily influenced by response speed. A streaming response that starts in 200ms feels responsive even if total generation takes 10 seconds. Non-streaming responses feel broken or slow regardless of actual quality. For any customer-facing AI feature, streaming is not optional — it is expected.

Common Mistakes

Forgetting ob_flush() — output buffering in PHP (and Nginx) swallows chunks; you must call both ob_flush() and flush() after each SSE event.
Not setting X-Accel-Buffering: no — Nginx buffers proxy responses by default; this header disables it for the streaming endpoint.
Not handling the stream's stop reason — always check for tool_use or max_tokens stop reasons to know whether the response was truncated.
Logging the full response without accumulating from deltas — each SSE event contains only a fragment; accumulate all deltas server-side if you need to log the complete response.

Code Examples

✗ Vulnerable

<?php
// ❌ Buffering full response before sending — user waits 10+ seconds
Route::post('/chat', function (Request $request) {
    $response = Anthropic::messages()->create([
        'model'      => 'claude-sonnet-4-20250514',
        'max_tokens' => 1000,
        'messages'   => [['role' => 'user', 'content' => $request->message]],
    ]);
    return response()->json(['text' => $response->content[0]->text]);
    // User sees nothing for 8 seconds then gets the full text at once
});

✓ Fixed

<?php
// ✅ Streaming SSE response from PHP
Route::post('/chat', function (Request $request) {
    return response()->stream(function () use ($request) {
        header('Content-Type: text/event-stream');
        header('X-Accel-Buffering: no'); // Disable Nginx buffering
        header('Cache-Control: no-cache');

        $stream = Anthropic::messages()->stream([
            'model'      => 'claude-sonnet-4-20250514',
            'max_tokens' => 1000,
            'messages'   => [['role' => 'user', 'content' => $request->message]],
        ]);

        foreach ($stream as $event) {
            if ($event->type === 'content_block_delta') {
                $delta = $event->delta->text ?? '';
                echo 'data: ' . json_encode(['delta' => $delta]) . "\n\n";
                ob_flush();
                flush(); // Send chunk immediately
            }
        }

        echo 'data: ' . json_encode(['done' => true]) . "\n\n";
        ob_flush(); flush();
    }, 200, ['Content-Type' => 'text/event-stream']);
});

// Client: new EventSource('/chat') + listen for delta events

LLM Streaming Responses

Also Known As

TL;DR

Explanation

Common Misconception

Why It Matters

Common Mistakes

Code Examples

References

Tags

LLM Streaming Responses

Also Known As

TL;DR

Explanation

Common Misconception

Why It Matters

Common Mistakes

Code Examples

References

Tags

Related Terms