Three hours into a session, the agent starts repeating itself. It forgets a rule you stated twenty minutes ago. It suggests an approach you already rejected. The code quality has noticeably dropped since the session started.
This is not a new model being trained. It is the same model, with the same weights, failing on the same codebase. The only thing that changed is the shape of the conversation. This chapter is about why.
Representation
Context — the sequence of tokens an agent has in view when it answers — is the single most consequential variable in an agentic coding session. It is also the most commonly mismanaged. Practitioners think of context as a workspace, something that holds everything needed for the job. The better mental model is a budget: finite, decaying, costly, and competitive. Every token in context is competing for the model’s attention; every additional token of noise dilutes signal on the tokens that actually matter.
The mechanism is attention. Transformer-based models distribute a fixed attention budget across all tokens in the window. Content near the start and end of context is recalled more reliably than details buried in the middle — this is not a cliff at the ceiling but a gradient that starts early. Critical instructions (your project brief, the current task spec) compete with accumulated tool outputs, failed attempts, and earlier sub-tasks. When noise dominates signal, the model responds from the noise.
A useful quantitative signal: practitioners observe a ~60–70% window-fill threshold This is a practitioner heuristic calibrated on 200K windows; it is not a hard system cutoff. On 1M-token windows, the same percentage represents five times more tokens. Absolute token count — not fill percentage — is what drives quality loss. where quality starts to drop noticeably. The threshold is softer on larger windows in percentage terms but firmer in absolute terms: 600K tokens of accumulated noise in a 1M window is qualitatively worse than 120K of noise in a 200K window, regardless of the percentage.
Three forces make context decay non-linear:
Noise accumulates faster than signal. A single file read adds ~2,000 tokens. A failed command + retry + correction adds ~2,500 tokens of failure patterns. A debugging session that loops a few times can fill 30% of a window with content that actively misleads the model. Noise grows polynomially against progress.
Some context is compaction-resistant. Extended thinking blocks (the internal reasoning traces some models emit) are immutable after generation — summarizers cannot touch them. A chapter of deep reasoning may be trapped in the window until you clear it entirely.
Attention decay is non-uniform. Tokens in the “middle third” of a long context are forgotten first. A rule stated early in session, then buried under file reads and tool outputs, is functionally absent even though it is technically still there.
These three forces produce the late-session degradation everyone notices but few proactively prevent.
Operation
Every CLI agent ships tools to manage context explicitly. The vocabulary differs; the primitives are the same: observe, compact, clear, persist.
The common primitives
Across Claude Code, Gemini CLI, and Codex CLI, four primitives recur:
- Observation — show the current window fill so you can decide.
- Compaction — summarize the conversation in place, preserving decisions, discarding noise.
- Clear / reset — discard the conversation entirely and start fresh.
- Persistence — a top-level briefing doc (CLAUDE.md / GEMINI.md / AGENTS.md) that survives compaction and is re-injected on every turn.
The table maps each tool’s surface to these primitives:
| Primitive | Claude Code | Gemini CLI | Codex CLI |
|---|---|---|---|
| Observe fill | /context | settings show; /context (proposed) | status in prompt |
| Compact | /compact (+ <focus> arg) | /compress | /compact |
| Clear | /clear | /chat clear | new session |
| Persist | CLAUDE.md | GEMINI.md | AGENTS.md |
| Reload persistence | auto on compact | /memory refresh | auto on launch |
Core protocols (tool-agnostic)
Some patterns apply regardless of which agent you use. They are consequences of how context works, not features of any tool.
The two-failure rule. After two failed corrections in a row, clear the context and try again with a better initial prompt. The second failure means the context now contains ~2,500 tokens of failure patterns — original error, first correction, apology, retry, second correction, second apology. Persisting a third round teaches the model to repeat the failures. A better opening prompt informed by what went wrong costs ~200 tokens and produces a better result.
The compaction protocol. Compaction is not a button you press when the window feels full — it is a three-step procedure.
Before: write down what matters in two or three sentences — the task, the decisions made, what remains. This becomes your compaction focus. Commit or save anything that is “done enough” to disk; compaction cannot lose what is already persisted.
During: be specific about what to preserve and what to discard. A bare compact-without-focus lets the model decide, and its priorities may not match yours.
After: verify by asking a recall question about a decision from earlier in the session. If the model cannot answer, inject the missing context from your notes or briefing doc. Do not trust compaction silently.
Durable artifacts: the primary persistence strategy
Compaction is a supplement to persistence, not a replacement. Anything you want to survive a session boundary must be on disk. Three artifacts carry most of the weight:
The briefing doc (CLAUDE.md / GEMINI.md / AGENTS.md) holds project rules, architecture, conventions. It is re-injected on every turn, so every token in it has leverage. Anthropic recommends individual files under 200 lines, combined under ~500 lines; the same bound applies across the three tools by analogy — the file is a context tax you pay on every single prompt, so every line must earn its place.
CURRENT_WORK.md holds transient state: what you are working on right now, what changed, what’s next. Written at the start and end of every session. A two-minute investment that saves ten minutes of re-discovery on return.
Git commits are the most durable artifact. Commit early, commit often. Compaction cannot summarize away what is already in the history.
Evolution
Context management is the most convergent area of agentic-coding practice — and simultaneously the area with the starkest active divergence. Tracking where those lines fall is most of what this section does.
Convergence: the briefing-document pattern. Claude Code established CLAUDE.md as a project-root convention; Gemini CLI followed with GEMINI.md, and Codex CLI adopted AGENTS.md. All three work the same way: markdown in the project root, re-injected on every turn, hierarchical loading from global → project → sub-directory. This is no longer a contested design choice — it is the standard shape.
Convergence: compaction as a primitive. Claude’s /compact, Gemini’s /compress, and Codex’s /compact all solve the same problem with broadly the same technique (summarize the history, replace it with the summary, keep the briefing doc intact). Auto-compaction thresholds vary but the mechanism converges.
Divergence: compaction implementation strategy. Three distinct approaches are in play. Claude performs two-phase compaction: it clears stale tool outputs first, only summarizing conversation if the first pass is insufficient. Gemini has shipped a union-find clustering alternative that resolves summaries asynchronously off the blocking path. Codex does a more direct summarize-and-replace. All three land at “shorter history, same briefing doc,” but the fidelity curves differ. Quality comparisons across tools are sensitive to which strategy applies.
Emerging: horizontal scaling. The pattern of running many parallel short sessions — rather than one long session — originated in Claude Code’s community and is spreading. The principle is general (conversations degrade over time, so keep them short and bridge with artifacts), but the tooling is still Claude-first: claude --worktree for git isolation, --continue / --resume / --from-pr for session management. Gemini and Codex have the primitives in pieces but not yet as a coherent workflow. Expect convergence here within 12–18 months; in the meantime, the pattern is portable if you’re willing to manage the plumbing yourself.
Emerging: subagent delegation. Spawning a child agent with its own context to handle a bounded sub-task is a Claude Code feature today. It is a natural next step for agent design — the research context does not pollute the parent session. Gemini has announced direction on this; Codex has not. Pattern not yet convergent.
Quick reference
- Context degrades non-linearly — manage it actively; do not wait for a hard limit.
- Observe before acting. Every CLI has a fill indicator; use it.
- Two-failure rule: after two corrections, clear and restart with a better prompt.
- Compaction is a three-step protocol (write down what matters → compact with focus → verify recall), not a reflex.
- Briefing document is the only context that survives every boundary. Budget it aggressively.
CURRENT_WORK.md+ frequent commits give you cheap session continuity without relying on compaction.- Horizontal scaling (many short sessions + artifacts) beats deep-context for most work.
- Window size is diverging across tools; window effectiveness is more convergent. Write for effectiveness, not ceiling.