Part V is the reliability domain — context management, escalation, error propagation, provenance. It opens with the most basic constraint behind all of them: the context window is finite, and a long conversation gets worse before it gets full. This chapter is the cert-exam angle; the mechanics of why long context degrades are proven in depth in the design book, to which it points. It is an architectural pattern — the accumulation-and-compaction shape is stable, while the window sizes and message types are the moving surface.
Context is a finite, accumulating resource
Everything in a session shares one budget. “Context window is cumulative within a session. System prompt, tool definitions, CLAUDE.md, conversation history, tool inputs/outputs all accumulate.” [Official] How the agent loop works · AnthropicT1-official original And the budget is concrete: current windows are 1M tokens on Opus 4.8 and Sonnet 4.6 and 200k on Haiku 4.5 — though Opus 4.8’s tokenizer can consume up to 35% more tokens for the same text, so the same conversation costs more of the budget on one model than another. [Official] Models overview · AnthropicT1-official original
Degradation comes before overflow
The failure that matters is not hitting the limit — it is the quiet decline well before it. As a window fills, a model attends less reliably to material buried in the middle of a long context (the “lost-in-the-middle” effect), and any progressive summarization of earlier turns discards detail that may later turn out to matter. These degradation mechanisms — context rot, lost-in-the-middle, summarization loss — are the subject of the Agentic Systems Design book’s chapter on context rot, where they are established against the research; this chapter’s job is to make you recognize them on the exam.
Compaction: the automatic defense, and its cost
When a session approaches the limit, the loop defends itself: “Automatic compaction triggers near the context limit.” [Official] How the agent loop works · AnthropicT1-official original The defense is lossy by construction: “Compaction replaces older messages with a summary, so specific instructions from early in the conversation may not be preserved. Persistent rules belong in CLAUDE.md (loaded via settingSources) rather than in the initial prompt, because CLAUDE.md content is re-injected on every request.” [Official] How the agent loop works · AnthropicT1-official original Compaction buys room by trading away fidelity to the early conversation — exactly the region most at risk from lost-in-the-middle in the first place.
Where the depth lives
This chapter is the exam-angle surface; the design book owns the mechanism. The degradation research, the measurement of context rot, and the assembly strategies that fight it live in the Agentic Systems Design book — its chapter on context rot for the failure modes and its chapter on context assembly for the deliberate construction of what goes in the window. The exam-relevant skill is diagnostic: given a long-session scenario, name whether it is accumulation pressure, lost-in-the-middle, or post-compaction loss, and reach for the matching mitigation.
Practice
Exercise solutions
B. CLAUDE.md content is re-injected on every request, so a rule placed there is present in the context after compaction just as before it — exactly what a session-long constraint needs. The timing (failure right after a compaction summary) is the tell that the original instruction was summarized away. A works for a few turns but fights the symptom by hand and will fail again at the next compaction. C delays the limit but does not address degradation — a larger window still loses the middle, and a long enough session compacts anyway. D governs output length, not whether an early instruction is retained, so it is unrelated to the failure.
What accumulates: the system prompt, tool definitions, CLAUDE.md, the full conversation history (every user and assistant turn), and all tool inputs and outputs — everything shares one cumulative budget within the session. “Still fits” is not “well-attended” because the token limit is a capacity bound while attention is a quality that declines as the window fills: a model attends less reliably to material buried in the middle of a long context, so a conversation comfortably under the limit can still have effectively lost an instruction given fifty turns ago. Fitting is necessary but not sufficient for the model to be using all of it well.
When a session nears the context limit, automatic compaction triggers and replaces older messages with a summary to buy room. An instruction placed only in the opening prompt is at risk because compaction summarizes the oldest turns first, and a one-line rule from turn one rarely survives the summary intact — so the constraint silently stops being honored (the failure often shows up right after a summary appears). A durable rule belongs where it is re-injected on every request: CLAUDE.md (loaded via settingSources), whose content is re-added to context each request and so is present after compaction exactly as before it — its survival no longer depends on a summarizer’s discretion.
Exam essentials
- Cumulative budget — system prompt, tool defs, CLAUDE.md, conversation history, and tool I/O all accumulate in one finite window (1M tokens on Opus 4.8 / Sonnet 4.6, 200k on Haiku 4.5; tokenizer density varies by model).
- Degradation precedes overflow — lost-in-the-middle and summarization loss erode a long context before it hits the limit; “fits” is not “well-attended.” Depth lives in the design book’s context-rot chapter.
- Compaction is lossy — it triggers near the limit and replaces older messages with a summary, so early-conversation specifics may not be preserved.
- Durable rules belong in re-injected context — put session-long constraints in CLAUDE.md (re-injected every request), not the opening prompt, so compaction cannot strand them.