Temperature & Sampling in LLMs
Also Known As
TL;DR
Explanation
LLMs generate text by computing a probability distribution over possible next tokens, then sampling from that distribution. Temperature is a scaling factor applied before sampling — temperature 1.0 leaves the distribution unchanged, values below 1.0 sharpen it (making the highest-probability tokens more likely), and values above 1.0 flatten it (making low-probability tokens more likely). Top-P sampling (nucleus sampling) is an alternative that samples from the smallest set of tokens whose cumulative probability exceeds P — more adaptive than temperature alone. Top-K limits sampling to the K most probable tokens. In practice: temperature 0 for deterministic tasks (code generation, structured extraction), temperature 0.3–0.7 for conversational responses, temperature 0.7–1.0 for creative writing.
Common Misconception
Why It Matters
Common Mistakes
- Using the same temperature for all tasks — set it per use-case based on whether the task needs consistency or creativity.
- Setting temperature very high to get 'more creative' outputs and then being surprised by factual errors.
- Confusing temperature and top-p — they interact; using both high values simultaneously produces very random outputs.
- Not logging temperature settings alongside LLM outputs — makes debugging inconsistent outputs much harder.
Code Examples
// ❌ High temperature for deterministic tasks (code gen, data extraction)
$response = $client->messages->create([
'model' => 'claude-sonnet-4-20250514',
'max_tokens' => 500,
'temperature' => 1.0, // maximum randomness
'messages' => [[
'role' => 'user',
'content' => 'Extract the order ID and total from this invoice: ...'
// Will produce different (wrong) JSON structures on every call
]]
]);
// ✅ Low temperature for deterministic extraction tasks
$response = $client->messages->create([
'model' => 'claude-sonnet-4-20250514',
'max_tokens' => 500,
'temperature' => 0.0, // Deterministic — same output every run
'messages' => [[
'role' => 'user',
'content' => 'Extract the order ID and total from this invoice '
. 'and return JSON only: {"order_id": ..., "total": ...}\n\n'
. $invoiceText
]]
]);
// Rule of thumb:
// 0.0–0.2 — extraction, classification, structured output, code generation
// 0.3–0.6 — Q&A, summarisation, analysis
// 0.7–1.0 — creative writing, brainstorming, variation generation