LLM Context Window
Also Known As
context length
context window
token limit
token budget
TL;DR
The maximum amount of text an LLM can process in one call — everything the model sees at once, including system prompt, conversation history, and retrieved context.
Explanation
A context window is measured in tokens (roughly 3-4 characters each). Claude 3.5: 200K tokens (~150K words). GPT-4: 128K. Smaller models: 4K-32K. The full window is processed on every call — larger windows cost more and are slower. Strategies for large contexts: chunking (split documents, process separately), RAG (retrieve only relevant chunks), summarisation (compress conversation history), and sliding windows (keep recent messages, summarise older ones). Performance degrades in the middle of very long contexts (lost in the middle problem).
Common Misconception
✗ A larger context window always means better performance — very long contexts cause the lost-in-the-middle problem where the model pays less attention to content in the middle; RAG with focused retrieval often outperforms stuffing everything into context.
Why It Matters
Sending the entire codebase in every API call is expensive and slow — understanding context window limits enables designing efficient RAG pipelines that retrieve only relevant code sections.
Common Mistakes
- Sending entire documents when only paragraphs are relevant — use RAG instead.
- Not tracking token usage — unexpected costs from large context on every call.
- Truncating from the start instead of the middle — recent messages and system prompt matter most.
- Ignoring the lost-in-the-middle problem — critical instructions buried in the middle of a long context may be ignored.
Code Examples
✗ Vulnerable
// Sending entire codebase in every call — expensive:
$allCode = file_get_contents('/var/www/app/src/**/*.php'); // 500KB
$response = $claude->messages->create([
'messages' => [[
'role' => 'user',
'content' => $allCode . '
Fix the bug in UserController',
]],
]);
// Cost: 500K tokens * $0.003/1K = $1.50 per request
✓ Fixed
// RAG — retrieve only relevant files:
$relevantFiles = $vectorDb->search('UserController bug', limit: 5);
$context = implode('
', array_column($relevantFiles, 'content'));
$response = $claude->messages->create([
'messages' => [[
'role' => 'user',
'content' => $context . '
Fix the bug in UserController',
]],
]);
// Cost: 5K tokens * $0.003/1K = $0.015 per request (100x cheaper)
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
16 Mar 2026
Edited
22 Mar 2026
Views
35
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
Amazonbot 8
Perplexity 7
Google 4
Unknown AI 2
Ahrefs 2
ChatGPT 2
Majestic 1
Also referenced
How they use it
crawler 24
crawler_json 2
Related categories
⚡
DEV INTEL
Tools & Severity
🟡 Medium
⚙ Fix effort: Medium
⚡ Quick Fix
Keep only the last N conversation turns in the context — implement a sliding window or summarisation strategy to stay within token limits without losing important context
📦 Applies To
any
web
cli
🔗 Prerequisites
🔍 Detection Hints
Entire conversation history sent on every request without truncation; context window overflow errors; token costs growing unboundedly with conversation length
Auto-detectable:
✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: High
✗ Manual fix
Fix: Medium
Context: File