Structured Output from LLMs (JSON Mode)
Also Known As
TL;DR
Explanation
Raw LLM text output is unreliable for programmatic consumption — the model may add prose around the JSON, use single quotes, or include trailing commas. Modern LLM APIs address this differently: Anthropic Claude supports tool use where the tool parameters enforce JSON schema; OpenAI has response_format: {type: 'json_object'} and structured outputs with JSON schema enforcement. For PHP applications, the most reliable approach is either tool/function calling (the model fills in typed parameters) or including a JSON schema example in the system prompt combined with JSON_THROW_ON_ERROR parsing with a retry on failure. Always validate the returned structure against your expected schema — even JSON mode does not guarantee your specific fields are present.
Common Misconception
Why It Matters
Common Mistakes
- Not validating the returned structure — json_decode() succeeding does not mean your expected keys are present; always check isset() or use a DTO.
- Using JSON mode for creative or open-ended tasks — JSON mode constrains the model significantly; use it only for extraction and classification tasks.
- Not implementing retry logic — LLMs occasionally produce invalid JSON even in JSON mode; catch JsonException and retry once with a stronger prompt.
- Nesting too deep — deeply nested JSON schemas increase hallucination risk; flatten your schema to 1–2 levels when possible.
Code Examples
<?php
// ❌ Asking for JSON in prose — fragile, often fails
$response = $client->messages->create([
'model' => 'claude-sonnet-4-20250514',
'messages' => [[
'role' => 'user',
'content' => 'Extract name and email from this text: ' . $text
. ' Return as JSON.'
]]
]);
// Model might return: "Here's the JSON: {"name": ...}" — breaks json_decode()
$data = json_decode($response->content[0]->text, true);
// $data might be null if response includes prose
<?php
// ✅ Tool use — schema enforced at API level
$response = $client->messages->create([
'model' => 'claude-sonnet-4-20250514',
'tools' => [[
'name' => 'extract_contact',
'description' => 'Extract contact information from text',
'input_schema' => [
'type' => 'object',
'properties' => [
'name' => ['type' => 'string'],
'email' => ['type' => 'string', 'format' => 'email'],
],
'required' => ['name', 'email'],
],
]],
'tool_choice' => ['type' => 'tool', 'name' => 'extract_contact'],
'messages' => [['role' => 'user', 'content' => $text]],
]);
$toolUse = collect($response->content)->firstWhere('type', 'tool_use');
$data = $toolUse->input; // Already a PHP array, schema-validated
// ✅ System prompt approach as fallback
$system = 'Respond ONLY with valid JSON matching this schema: '
. '{"name": string, "email": string}. No prose, no markdown.';
// Then validate after: if (!isset($data['name'], $data['email'])) retry();