Part V is the reliability domain — context management, escalation, error propagation, provenance. It opens with the most basic constraint behind all of them: the context window is finite, and a long conversation gets worse before it gets full. This chapter is the cert-exam angle; the mechanics of why long context degrades are proven in depth in the design book, to which it points. It is an architectural pattern — the accumulation-and-compaction shape is stable, while the window sizes and message types are the moving surface.

Do I know this already? Diagnostic

Answer these confidently and you can skim ahead to Exam essentials; if any is shaky, read closely — each is developed below.

List what shares the cumulative context budget within a session.
“Fits in the window” and “well-attended” — why are these different claims?
What does automatic compaction do, and which part of the conversation does it tend to drop first?
A rule given only in the opening prompt stops being followed right after a compaction summary. Where should it have lived, and why?
Given a misbehaving long session, which three failure modes should you discriminate among before reaching for a fix?

Check your answers

The system prompt, tool definitions, CLAUDE.md, conversation history, and tool inputs/outputs all accumulate in one finite budget that never refills within a session.
The token limit is a capacity bound while attention is a quality that declines as the window fills — a conversation comfortably under the limit can still have lost the thread of an instruction given fifty turns ago (lost-in-the-middle).
Near the context limit it replaces older messages with a summary — it summarizes the oldest turns first, so specific instructions from early in the conversation are most at risk.
In CLAUDE.md, because its content is re-injected on every request and so survives compaction, unlike an opening-prompt rule whose survival depends on a summarizer’s discretion.
Accumulation pressure, lost-in-the-middle, and post-compaction loss — three distinct conditions with three distinct remedies.

Context is a finite, accumulating resource

Everything in a session shares one budget. “Context window is cumulative within a session. System prompt, tool definitions, CLAUDE.md, conversation history, tool inputs/outputs all accumulate.” [Official] How the agent loop works · AnthropicT1-official original And the budget is concrete: current windows are 1M tokens on Opus 4.8 and Sonnet 4.6 and 200k on Haiku 4.5 — though Opus 4.8’s tokenizer can consume up to 35% more tokens for the same text, so the same conversation costs more of the budget on one model than another. [Official] Models overview · AnthropicT1-official original

Degradation comes before overflow

The failure that matters is not hitting the limit — it is the quiet decline well before it. As a window fills, a model attends less reliably to material buried in the middle of a long context (the “lost-in-the-middle” effect), and any progressive summarization of earlier turns discards detail that may later turn out to matter. These degradation mechanisms — context rot, lost-in-the-middle, summarization loss — are the subject of the Agentic Systems Design book’s chapter on context rot, where they are established against the research; this chapter’s job is to make you recognize them on the exam.

Compaction: the automatic defense, and its cost

When a session approaches the limit, the loop defends itself: “Automatic compaction triggers near the context limit.” [Official] How the agent loop works · AnthropicT1-official original The defense is lossy by construction: “Compaction replaces older messages with a summary, so specific instructions from early in the conversation may not be preserved. Persistent rules belong in CLAUDE.md (loaded via settingSources) rather than in the initial prompt, because CLAUDE.md content is re-injected on every request.” [Official] How the agent loop works · AnthropicT1-official original Compaction buys room by trading away fidelity to the early conversation — exactly the region most at risk from lost-in-the-middle in the first place.

Where the depth lives

This chapter is the exam-angle surface; the design book owns the mechanism. The degradation research, the measurement of context rot, and the assembly strategies that fight it live in the Agentic Systems Design book — its chapter on context rot for the failure modes and its chapter on context assembly for the deliberate construction of what goes in the window. The exam-relevant skill is diagnostic: given a long-session scenario, name whether it is accumulation pressure, lost-in-the-middle, or post-compaction loss, and reach for the matching mitigation.

Diagnosing three long-session failures Worked example

The exam-relevant skill is to name the failure mode before reaching for a fix. Three scenarios, three different diagnoses:

Scenario A — the rule dropped after a summary. A constraint you gave at turn 1 stops being honored around turn 40, just as a compaction summary appears. Diagnosis: post-compaction loss — the oldest turns were summarized and the early rule didn’t survive intact. Fix: move it to CLAUDE.md, which is re-injected every request.
Scenario B — a buried fact misremembered, no compaction. The session is comfortably under the token limit and never compacted, yet Claude misremembers a detail established ~60 turns ago in a long stretch of context. Diagnosis: lost-in-the-middle — material buried in the middle of a long window is attended to least reliably. Fix: re-surface the fact near the end of the context (restate it), or assemble the working context deliberately rather than letting it sprawl.
Scenario C — the session simply fills up. Turns keep growing until the window is near full and the loop compacts (or you hit the limit). Diagnosis: accumulation pressure — nothing is degraded yet; the budget is just spent. Fix: reduce what accumulates (tighter tool outputs, /clear and restart with a focused prompt) rather than treating it as a quality bug.

The discipline: “fits,” “buried,” and “summarized away” are three distinct conditions with three distinct remedies. Reaching for a bigger window (Scenario C’s instinct) does nothing for A or B — a larger context still loses its middle and still compacts eventually. Name the mode first.

Practice

Exercise solutions

Solution ↑ Exercise

B. CLAUDE.md content is re-injected on every request, so a rule placed there is present in the context after compaction just as before it — exactly what a session-long constraint needs. The timing (failure right after a compaction summary) is the tell that the original instruction was summarized away. A works for a few turns but fights the symptom by hand and will fail again at the next compaction. C delays the limit but does not address degradation — a larger window still loses the middle, and a long enough session compacts anyway. D governs output length, not whether an early instruction is retained, so it is unrelated to the failure.

Solution ↑ Exercise

What accumulates: the system prompt, tool definitions, CLAUDE.md, the full conversation history (every user and assistant turn), and all tool inputs and outputs — everything shares one cumulative budget within the session. “Still fits” is not “well-attended” because the token limit is a capacity bound while attention is a quality that declines as the window fills: a model attends less reliably to material buried in the middle of a long context, so a conversation comfortably under the limit can still have effectively lost an instruction given fifty turns ago. Fitting is necessary but not sufficient for the model to be using all of it well.

Solution ↑ Exercise

When a session nears the context limit, automatic compaction triggers and replaces older messages with a summary to buy room. An instruction placed only in the opening prompt is at risk because compaction summarizes the oldest turns first, and a one-line rule from turn one rarely survives the summary intact — so the constraint silently stops being honored (the failure often shows up right after a summary appears). A durable rule belongs where it is re-injected on every request: CLAUDE.md (loaded via settingSources), whose content is re-added to context each request and so is present after compaction exactly as before it — its survival no longer depends on a summarizer’s discretion.

Exam essentials

Cumulative budget — system prompt, tool defs, CLAUDE.md, conversation history, and tool I/O all accumulate in one finite window (1M tokens on Opus 4.8 / Sonnet 4.6, 200k on Haiku 4.5; tokenizer density varies by model).
Degradation precedes overflow — lost-in-the-middle and summarization loss erode a long context before it hits the limit; “fits” is not “well-attended.” Depth lives in the design book’s context-rot chapter.
Compaction is lossy — it triggers near the limit and replaces older messages with a summary, so early-conversation specifics may not be preserved.
Durable rules belong in re-injected context — put session-long constraints in CLAUDE.md (re-injected every request), not the opening prompt, so compaction cannot strand them.