← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

LLM Context Window

ai_ml Intermediate

Also Known As

context length context window token limit token budget

TL;DR

The maximum amount of text an LLM can process in one call — everything the model sees at once, including system prompt, conversation history, and retrieved context.

Explanation

A context window is measured in tokens (roughly 3-4 characters each). Claude 3.5: 200K tokens (~150K words). GPT-4: 128K. Smaller models: 4K-32K. The full window is processed on every call — larger windows cost more and are slower. Strategies for large contexts: chunking (split documents, process separately), RAG (retrieve only relevant chunks), summarisation (compress conversation history), and sliding windows (keep recent messages, summarise older ones). Performance degrades in the middle of very long contexts (lost in the middle problem).

Common Misconception

A larger context window always means better performance — very long contexts cause the lost-in-the-middle problem where the model pays less attention to content in the middle; RAG with focused retrieval often outperforms stuffing everything into context.

Why It Matters

Sending the entire codebase in every API call is expensive and slow — understanding context window limits enables designing efficient RAG pipelines that retrieve only relevant code sections.

Common Mistakes

  • Sending entire documents when only paragraphs are relevant — use RAG instead.
  • Not tracking token usage — unexpected costs from large context on every call.
  • Truncating from the start instead of the middle — recent messages and system prompt matter most.
  • Ignoring the lost-in-the-middle problem — critical instructions buried in the middle of a long context may be ignored.

Code Examples

✗ Vulnerable
// Sending entire codebase in every call — expensive:
$allCode = file_get_contents('/var/www/app/src/**/*.php'); // 500KB
$response = $claude->messages->create([
    'messages' => [[
        'role' => 'user',
        'content' => $allCode . '

Fix the bug in UserController',
    ]],
]);
// Cost: 500K tokens * $0.003/1K = $1.50 per request
✓ Fixed
// RAG — retrieve only relevant files:
$relevantFiles = $vectorDb->search('UserController bug', limit: 5);
$context = implode('
', array_column($relevantFiles, 'content'));

$response = $claude->messages->create([
    'messages' => [[
        'role' => 'user',
        'content' => $context . '

Fix the bug in UserController',
    ]],
]);
// Cost: 5K tokens * $0.003/1K = $0.015 per request (100x cheaper)

Added 16 Mar 2026
Edited 22 Mar 2026
Views 35
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 2 pings T 0 pings F 2 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 1 ping S 1 ping S 0 pings M 1 ping T 0 pings W 0 pings T 1 ping F 1 ping S
Amazonbot 8 Perplexity 7 Google 4 Unknown AI 2 Ahrefs 2 ChatGPT 2 Majestic 1
crawler 24 crawler_json 2
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Keep only the last N conversation turns in the context — implement a sliding window or summarisation strategy to stay within token limits without losing important context
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Entire conversation history sent on every request without truncation; context window overflow errors; token costs growing unboundedly with conversation length
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: File

✓ schema.org compliant