← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

LLM Temperature & Sampling Strategies

ai_ml Intermediate

Also Known As

temperature top-p sampling nucleus sampling top-k sampling LLM sampling

TL;DR

Parameters that control the randomness and diversity of LLM output — temperature scales token probabilities, while top-p and top-k limit the candidate pool before sampling.

Explanation

When an LLM generates the next token it produces a probability distribution over its entire vocabulary. Sampling strategies determine how a token is chosen from that distribution. Temperature (T) scales the logits before softmax: T < 1 sharpens the distribution (the most probable tokens become even more dominant, output is more predictable and repetitive), T > 1 flattens it (lower-probability tokens become more likely, output is more creative but less coherent), T = 0 is greedy decoding (always picks the highest-probability token, fully deterministic). Top-p (nucleus sampling) discards all tokens outside the smallest set whose cumulative probability exceeds p — e.g. top-p 0.9 considers only the tokens that together account for 90% of the probability mass, dynamically adjusting the candidate pool size. Top-k hard-limits the candidate pool to the k most probable tokens regardless of their probabilities. In practice: use T ≈ 0 / top-p 1.0 for factual extraction, code generation, or structured output where correctness matters; use T 0.7–1.0 for creative writing or brainstorming; never use T > 1.0 in production — it frequently produces incoherent output. Temperature and top-p interact: setting both low is doubly restrictive and often unnecessary.

Diagram

flowchart LR
    LOGITS[Raw Logits from Model] --> TEMP[Apply Temperature T]
    TEMP -->|T=0 greedy| GREEDY[Pick max token<br/>deterministic]
    TEMP -->|T=0.7 balanced| SOFT[Softmax distribution]
    SOFT --> TOPP[Top-p filter<br/>keep 90% probability mass]
    TOPP --> SAMPLE[Sample token]
    SAMPLE --> OUTPUT[Generated token]
    subgraph Settings
        LOW[T close to 0<br/>factual structured]
        MED[T 0.7-0.9<br/>creative coherent]
        HIGH[T above 1.0<br/>avoid in production]
    end
style GREEDY fill:#0d419d,color:#fff
style HIGH fill:#f85149,color:#fff
style LOW fill:#238636,color:#fff

Common Misconception

Higher temperature always produces better creative output — above ~1.2 models typically produce incoherent or repetitive text; creative diversity is better achieved with moderate temperature (0.7–1.0) combined with good prompting.

Why It Matters

Wrong sampling settings silently degrade output quality: T=0 on a creative task produces dull, repetitive text; high T on a code generation task produces plausible-looking but broken code.

Common Mistakes

  • Using the API default temperature for every use-case — defaults are a compromise; structured output tasks need T≈0, creative tasks need T≈0.8.
  • Setting top-p and top-k simultaneously without understanding they compound — use one or the other, not both.
  • Using T > 1.0 in production believing it adds creativity — it primarily adds incoherence.
  • Not testing at the target temperature — a prompt that works at T=0 may fail badly at T=0.9.

Avoid When

  • Setting temperature above 1.0 in any production context — it degrades coherence without meaningfully improving creativity.
  • Setting both top-p and top-k simultaneously — they compound in ways that are hard to reason about; pick one.

When To Use

  • Use temperature 0 (or close to it) for structured data extraction, code generation, classification, and any task where output format consistency matters.
  • Use temperature 0.7–0.9 with top-p 0.9–0.95 for creative writing, brainstorming, and summarisation tasks.
  • Always test your prompt at the exact temperature you will use in production — behaviour changes significantly.
  • Document the temperature setting alongside your prompt in version control so future changes are deliberate.

Code Examples

💡 Note
Temperature 0 for structured/factual tasks guarantees deterministic output; temperature 0.8 with top-p 0.95 for creative tasks balances diversity with coherence.
✗ Vulnerable
// Using default temperature for structured data extraction
$response = $llm->complete(
    prompt: 'Extract the invoice total from this text: ' . $invoiceText
    // No temperature set — API default is often 0.7-1.0
    // Risk: model may output varied, non-parseable formats
);
✓ Fixed
// Match temperature to the task

// Structured extraction — deterministic, parseable
$structured = $llm->complete(
    prompt: 'Extract invoice data as JSON: {"total": number, "currency": string}\n\n' . $invoiceText,
    temperature: 0.0,
    top_p: 1.0
);

// Creative generation — diverse but coherent
$creative = $llm->complete(
    prompt: 'Write a product description for: ' . $productName,
    temperature: 0.8,
    top_p: 0.95
    // Do NOT also set top_k — compounding restrictions hurt quality
);

Added 29 Mar 2026
Views 21
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 0 pings M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 1 ping S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S
No pings yet today
No pings yesterday
Amazonbot 6 Perplexity 3 Google 2 Unknown AI 2 ChatGPT 1 Ahrefs 1
crawler 14 crawler_json 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Low
⚡ Quick Fix
Set temperature 0 for any task requiring structured or factual output; use 0.7–0.9 for creative tasks; never ship T > 1.0 to production
📦 Applies To
any web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
LLM API calls with no explicit temperature parameter for structured output tasks, or temperature > 1.0 in any context
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: High False Positives: Medium ✓ Auto-fixable Fix: Low Context: Function Tests: Update

✓ schema.org compliant