LLM Streaming Responses
Also Known As
LLM streaming
streaming tokens
SSE AI
streaming PHP LLM
TL;DR
Receiving LLM output token-by-token as it is generated rather than waiting for the full response — dramatically improving perceived latency for users and enabling real-time displays of AI-generated content.
Explanation
Without streaming, a user waits 5–15 seconds staring at a spinner before any text appears. With streaming, text appears within ~200ms and continues flowing. On the server side, LLM APIs return server-sent events or chunked transfer encoding. The PHP SDK receives delta objects with partial text. The PHP backend must forward these to the browser either via SSE (text/event-stream) or by buffering and returning the full response. For real-time display, SSE from PHP to browser is the standard pattern. Each delta contains a small text fragment; accumulate them server-side for logging while streaming each to the client. Stop reasons (end_turn, max_tokens, tool_use) indicate why generation stopped.
Common Misconception
✗ Streaming requires WebSockets. SSE (Server-Sent Events) over standard HTTP is sufficient and simpler — no WebSocket server needed, works through standard proxies, and the browser EventSource API handles reconnection automatically.
Why It Matters
User perception of AI quality is heavily influenced by response speed. A streaming response that starts in 200ms feels responsive even if total generation takes 10 seconds. Non-streaming responses feel broken or slow regardless of actual quality. For any customer-facing AI feature, streaming is not optional — it is expected.
Common Mistakes
- Forgetting ob_flush() — output buffering in PHP (and Nginx) swallows chunks; you must call both ob_flush() and flush() after each SSE event.
- Not setting X-Accel-Buffering: no — Nginx buffers proxy responses by default; this header disables it for the streaming endpoint.
- Not handling the stream's stop reason — always check for tool_use or max_tokens stop reasons to know whether the response was truncated.
- Logging the full response without accumulating from deltas — each SSE event contains only a fragment; accumulate all deltas server-side if you need to log the complete response.
Code Examples
✗ Vulnerable
<?php
// ❌ Buffering full response before sending — user waits 10+ seconds
Route::post('/chat', function (Request $request) {
$response = Anthropic::messages()->create([
'model' => 'claude-sonnet-4-20250514',
'max_tokens' => 1000,
'messages' => [['role' => 'user', 'content' => $request->message]],
]);
return response()->json(['text' => $response->content[0]->text]);
// User sees nothing for 8 seconds then gets the full text at once
});
✓ Fixed
<?php
// ✅ Streaming SSE response from PHP
Route::post('/chat', function (Request $request) {
return response()->stream(function () use ($request) {
header('Content-Type: text/event-stream');
header('X-Accel-Buffering: no'); // Disable Nginx buffering
header('Cache-Control: no-cache');
$stream = Anthropic::messages()->stream([
'model' => 'claude-sonnet-4-20250514',
'max_tokens' => 1000,
'messages' => [['role' => 'user', 'content' => $request->message]],
]);
foreach ($stream as $event) {
if ($event->type === 'content_block_delta') {
$delta = $event->delta->text ?? '';
echo 'data: ' . json_encode(['delta' => $delta]) . "\n\n";
ob_flush();
flush(); // Send chunk immediately
}
}
echo 'data: ' . json_encode(['done' => true]) . "\n\n";
ob_flush(); flush();
}, 200, ['Content-Type' => 'text/event-stream']);
});
// Client: new EventSource('/chat') + listen for delta events
References
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
23 Mar 2026
Views
34
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
Amazonbot 1
Amazonbot 1
Amazonbot 14
ChatGPT 3
Google 3
Perplexity 3
Meta AI 2
Ahrefs 2
Qwen 1
Also referenced
How they use it
crawler 26
crawler_json 2
Related categories
⚡
DEV INTEL
Tools & Severity
⚙ Fix effort: Medium
⚡ Quick Fix
Set header('Content-Type: text/event-stream') and header('X-Accel-Buffering: no') in PHP, then echo 'data: ' . json_encode(['delta' => $chunk]) . "\n\n" for each token, calling ob_flush() + flush() after each.
📦 Applies To
PHP 8.0+
web
laravel
symfony