{
    "slug": "ai_context_management",
    "term": "AI Context Management",
    "category": "ai_ml",
    "difficulty": "intermediate",
    "short": "The practice of selecting, ordering, and trimming what goes into an LLM's context window to maximise relevance while staying under token limits.",
    "long": "AI context management is the discipline of deciding what information an LLM sees on each request. Because a model only knows what fits in its context window, the quality of an answer depends as much on what you put in front of it as on the prompt wording. Context typically combines several sources: a system prompt, conversation history, retrieved documents (RAG), tool outputs, and the user's current message. Each of these competes for a finite token budget, so management means prioritising, summarising, truncating, and ordering these pieces deliberately rather than dumping everything in.\n\nKey techniques include sliding-window history (keep the last N turns), summarisation (compress old turns into a running summary), retrieval (pull only the chunks relevant to the current query), and structured slotting (reserve fixed budgets for system instructions, retrieved context, and history). Token counting is central: you must measure how many tokens each part consumes and reserve headroom for the model's response. Position matters too - models attend more reliably to content at the start and end of context (the 'lost in the middle' effect), so critical instructions and the user query should not be buried.\n\nIn PHP applications, context management usually happens in an application service layer that assembles the request payload before calling the model API. You count tokens (often via an approximation or a tokenizer library), enforce a budget, drop or summarise the oldest history, inject retrieved passages, and log what was actually sent for debugging. Poor context management shows up as truncated answers, ignored instructions, runaway costs, and hallucinations when the model invents details that were silently dropped. Good context management is the difference between a chatbot that remembers the conversation and stays cheap, and one that forgets, contradicts itself, or burns the token budget on irrelevant boilerplate.",
    "aliases": [
        "context window management",
        "prompt assembly",
        "context engineering",
        "context budgeting"
    ],
    "tags": [
        "ai_ml",
        "context-management",
        "llm",
        "token-budget",
        "rag",
        "prompt-engineering"
    ],
    "misconception": "Bigger context windows mean you no longer need to manage context - just send everything. In practice, stuffing context degrades relevance (lost-in-the-middle), raises cost and latency, and increases hallucination risk; deliberate selection still beats brute-force inclusion.",
    "why_it_matters": "Context determines answer quality, cost, and latency on every request, so disciplined assembly directly affects user experience and bill size. Unmanaged context silently truncates history or documents, producing wrong answers that are hard to debug because the failure is invisible in the prompt text.",
    "common_mistakes": [
        "Appending unbounded conversation history until the window overflows and oldest messages are silently dropped.",
        "Not counting tokens before sending, so requests fail or truncate unpredictably under load.",
        "Placing critical instructions in the middle of a large context where the model attends to them least.",
        "Including full documents instead of retrieving and inserting only the relevant chunks.",
        "Reserving no headroom for the response, causing the model to cut off mid-answer."
    ],
    "when_to_use": [
        "Multi-turn chat where history grows unbounded across a session.",
        "RAG pipelines that inject retrieved documents alongside history and instructions.",
        "Any system where token cost, latency, or truncation failures are observable in production."
    ],
    "avoid_when": [
        "Single-shot prompts with small fixed inputs where the whole payload trivially fits the window.",
        "Prototypes where the conversation length is bounded and cost is negligible."
    ],
    "related": [
        "llm_context_window",
        "rag_retrieval",
        "prompt_caching",
        "tokenization_llm",
        "ai_cost_management"
    ],
    "prerequisites": [
        "llm_context_window",
        "tokenization_llm",
        "large_language_models"
    ],
    "refs": [
        "https://docs.anthropic.com/en/docs/build-with-claude/context-windows",
        "https://arxiv.org/abs/2307.03172",
        "https://www.anthropic.com/news/prompt-caching"
    ],
    "bad_code": "<?php\n// Unbounded history - eventually overflows the window and silently truncates\nclass Chat {\n    private array $history = [];\n\n    public function ask(string $message, ClaudeClient $client): string {\n        $this->history[] = ['role' => 'user', 'content' => $message];\n        // Sends entire history every time, no token counting, no budget\n        $response = $client->complete([\n            'system'   => $this->bigSystemPrompt(),\n            'messages' => $this->history,\n        ]);\n        $this->history[] = ['role' => 'assistant', 'content' => $response];\n        return $response;\n    }\n}",
    "good_code": "<?php\n// Budgeted context: reserve response headroom, trim oldest turns\nclass Chat {\n    private array $history = [];\n    private int $maxContextTokens = 100_000;\n    private int $responseReserve = 4_000;\n\n    public function ask(string $message, ClaudeClient $client, Tokenizer $tok): string {\n        $this->history[] = ['role' => 'user', 'content' => $message];\n\n        $system = $this->systemPrompt();\n        $budget = $this->maxContextTokens - $this->responseReserve - $tok->count($system);\n\n        // Keep most recent turns that fit the budget\n        $messages = [];\n        $used = 0;\n        foreach (array_reverse($this->history) as $turn) {\n            $cost = $tok->count($turn['content']);\n            if ($used + $cost > $budget) break;\n            $used += $cost;\n            array_unshift($messages, $turn);\n        }\n\n        $response = $client->complete([\n            'system'     => $system,\n            'messages'   => $messages,\n            'max_tokens' => $this->responseReserve,\n        ]);\n        $this->history[] = ['role' => 'assistant', 'content' => $response];\n        return $response;\n    }\n}",
    "quick_fix": "Add a token budget that reserves headroom for the response, then trim or summarise oldest history before each call instead of appending unbounded.",
    "severity": "medium",
    "effort": "medium",
    "created": "2026-06-08",
    "updated": "2026-06-08",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/ai_context_management",
        "html_url": "https://codeclaritylab.com/glossary/ai_context_management",
        "json_url": "https://codeclaritylab.com/glossary/ai_context_management.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[AI Context Management](https://codeclaritylab.com/glossary/ai_context_management) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/ai_context_management"
            }
        }
    }
}