← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

LLM Context Window

AI / ML Intermediate
debt(d8/e5/b7/t7)
d8 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9), scored d8. The detection_hints field explicitly states 'automated: no' and the code pattern (entire conversation history sent without truncation) doesn't trigger any linter or static analysis tool. Token costs grow unboundedly and the lost-in-the-middle degradation in model quality is invisible at the code level — it only surfaces as unexpected API bills or subtly wrong model outputs in production. Slightly below d9 because overflow errors can at least surface as runtime exceptions, giving some signal.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix mentions implementing a sliding window or summarisation strategy, which goes beyond a one-line patch. It requires designing and integrating a token-tracking mechanism, a conversation truncation or summarisation pipeline, and potentially a RAG retrieval layer — work that spans prompt management, API call logic, and possibly storage of conversation state across multiple components.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). The applies_to covers both web and cli contexts broadly. Every feature that involves LLM calls must be designed around context window constraints: conversation history management, RAG pipeline design, token budgeting, and cost control all flow from this choice. Any engineer adding new LLM-powered features must reason about context limits, making this a persistent cross-cutting architectural concern that shapes how the entire AI integration layer is structured.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field explicitly calls out that 'a larger context window always means better performance' — this is the canonical wrong belief. Developers naturally assume more context = better results (analogous to more data = better outcomes in ML), but the lost-in-the-middle problem means the opposite can be true. Additionally, the common mistake of truncating from the start (losing the system prompt) instead of the middle contradicts intuition. These are serious, non-obvious behavioral inversions.

About DEBT scoring →

Also Known As

context length context window token limit token budget

TL;DR

The maximum amount of text an LLM can process in one call — everything the model sees at once, including system prompt, conversation history, and retrieved context.

Explanation

A context window is measured in tokens (roughly 3-4 characters each). Claude 3.5: 200K tokens (~150K words). GPT-4: 128K. Smaller models: 4K-32K. The full window is processed on every call — larger windows cost more and are slower. Strategies for large contexts: chunking (split documents, process separately), RAG (retrieve only relevant chunks), summarisation (compress conversation history), and sliding windows (keep recent messages, summarise older ones). Performance degrades in the middle of very long contexts (lost in the middle problem).

Common Misconception

A larger context window always means better performance — very long contexts cause the lost-in-the-middle problem where the model pays less attention to content in the middle; RAG with focused retrieval often outperforms stuffing everything into context.

Why It Matters

Sending the entire codebase in every API call is expensive and slow — understanding context window limits enables designing efficient RAG pipelines that retrieve only relevant code sections.

Common Mistakes

  • Sending entire documents when only paragraphs are relevant — use RAG instead.
  • Not tracking token usage — unexpected costs from large context on every call.
  • Truncating from the start instead of the middle — recent messages and system prompt matter most.
  • Ignoring the lost-in-the-middle problem — critical instructions buried in the middle of a long context may be ignored.

Code Examples

✗ Vulnerable
// Sending entire codebase in every call — expensive:
$allCode = file_get_contents('/var/www/app/src/**/*.php'); // 500KB
$response = $claude->messages->create([
    'messages' => [[
        'role' => 'user',
        'content' => $allCode . '

Fix the bug in UserController',
    ]],
]);
// Cost: 500K tokens * $0.003/1K = $1.50 per request
✓ Fixed
// RAG — retrieve only relevant files:
$relevantFiles = $vectorDb->search('UserController bug', limit: 5);
$context = implode('
', array_column($relevantFiles, 'content'));

$response = $claude->messages->create([
    'messages' => [[
        'role' => 'user',
        'content' => $context . '

Fix the bug in UserController',
    ]],
]);
// Cost: 5K tokens * $0.003/1K = $0.015 per request (100x cheaper)

Added 16 Mar 2026
Edited 30 May 2026
Views 72
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 1 ping W 1 ping T 1 ping F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 0 pings F 3 pings S 4 pings S 2 pings M 1 ping T 2 pings W 1 ping T 0 pings F 1 ping S 1 ping S 0 pings M 0 pings T 0 pings W 1 ping T 1 ping F 1 ping S 1 ping S 0 pings M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Scrapy 11 Amazonbot 9 Google 8 Perplexity 7 ChatGPT 6 Ahrefs 4 SEMrush 3 Unknown AI 2 Claude 2 Bing 2 Majestic 1 Qwen 1 PetalBot 1
crawler 52 crawler_json 5
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Keep only the last N conversation turns in the context — implement a sliding window or summarisation strategy to stay within token limits without losing important context
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Entire conversation history sent on every request without truncation; context window overflow errors; token costs growing unboundedly with conversation length
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: File

✓ schema.org compliant