← Back to glossary

LLM Context Window

ai_ml Intermediate

Also Known As

context length context window token limit token budget

TL;DR

The maximum amount of text an LLM can process in one call — everything the model sees at once, including system prompt, conversation history, and retrieved context.

Explanation

A context window is measured in tokens (roughly 3-4 characters each). Claude 3.5: 200K tokens (~150K words). GPT-4: 128K. Smaller models: 4K-32K. The full window is processed on every call — larger windows cost more and are slower. Strategies for large contexts: chunking (split documents, process separately), RAG (retrieve only relevant chunks), summarisation (compress conversation history), and sliding windows (keep recent messages, summarise older ones). Performance degrades in the middle of very long contexts (lost in the middle problem).

Common Misconception

✗ A larger context window always means better performance — very long contexts cause the lost-in-the-middle problem where the model pays less attention to content in the middle; RAG with focused retrieval often outperforms stuffing everything into context.

Why It Matters

Sending the entire codebase in every API call is expensive and slow — understanding context window limits enables designing efficient RAG pipelines that retrieve only relevant code sections.

Common Mistakes

Sending entire documents when only paragraphs are relevant — use RAG instead.
Not tracking token usage — unexpected costs from large context on every call.
Truncating from the start instead of the middle — recent messages and system prompt matter most.
Ignoring the lost-in-the-middle problem — critical instructions buried in the middle of a long context may be ignored.

Code Examples

✗ Vulnerable

// Sending entire codebase in every call — expensive:
$allCode = file_get_contents('/var/www/app/src/**/*.php'); // 500KB
$response = $claude->messages->create([
    'messages' => [[
        'role' => 'user',
        'content' => $allCode . '

Fix the bug in UserController',
    ]],
]);
// Cost: 500K tokens * $0.003/1K = $1.50 per request

✓ Fixed

// RAG — retrieve only relevant files:
$relevantFiles = $vectorDb->search('UserController bug', limit: 5);
$context = implode('
', array_column($relevantFiles, 'content'));

$response = $claude->messages->create([
    'messages' => [[
        'role' => 'user',
        'content' => $context . '

Fix the bug in UserController',
    ]],
]);
// Cost: 5K tokens * $0.003/1K = $0.015 per request (100x cheaper)

References

↗ https://www.anthropic.com/research/long-context-windows

Tags

ai llm performance

Added 16 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,445 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 1

Amazonbot 8 Perplexity 7 Google 4 Unknown AI 2 Ahrefs 2 ChatGPT 2 Majestic 1

Also referenced

AI API Cost Management 39 Retrieval-Augmented Generation (RAG) 26 Fine-Tuning LLMs 23 Embeddings 22

How they use it

crawler 24 crawler_json 2

Related categories

ai_ml 1k

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: Medium

⚡ Quick Fix

Keep only the last N conversation turns in the context — implement a sliding window or summarisation strategy to stay within token limits without losing important context

📦 Applies To

any web cli

🔗 Prerequisites

Large Language Models (LLMs) Retrieval-Augmented Generation (RAG) AI API Cost Management

🔍 Detection Hints

Entire conversation history sent on every request without truncation; context window overflow errors; token costs growing unboundedly with conversation length

Auto-detectable: ✗ No

⚠ Related Problems

AI API Cost Management Large Language Models (LLMs) performance degradation

🤖 AI Agent

Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: File