Designing the Whole: Environment + Context as One System
The capstone — an integrative design workflow that composes the book's eight core chapters into one discipline, with decision points and an honest map of what is settled, converged, first-party-only, and openly unsolved.
On this page
This chapter is integrative. It introduces no new evidence — it composes the book’s grounded claims into a design workflow and a decision guide. Where it restates a load-bearing fact, it points back to the chapter that established it; the rest is synthesis.
The two layers are one system
The book opened on a thesis: what turns a model into an agent is the engineering of the two layers around it — the environment it acts in and the context it reasons over — and that discipline is the most underappreciated, highest-leverage thing an architect designs. Eight chapters in, the payoff is that they are not eight topics but two ends of one loop: the environment is the durable store of everything the agent could use; the context is the finite slice it does use each turn; and the harness owns the boundary between them — context being “a finite resource with diminishing marginal returns.” [Official] Effective context engineering for AI agents · Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield (2025)T1-official original
A design workflow
The chapters fall into a natural order when you design a real agent’s environment and context together.
- Make the environment legible (E1, E5). Maximize signal in and machine-checkable feedback out; at scale, bound what must be loaded (interface contracts, shallow index, scope-to-workspace).
- Budget the always-on layer (E2). The instruction file is paid every turn — spend it only on broadly-applicable, can’t-infer-from-code context. More is not better. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? · Gloaguen, Mündler, Müller, Raychev, Vechev (ETH Zurich) (2026)T3-practitioner original
- Push procedures to load on demand (E3). Skills are just-in-time procedural knowledge; keep them out of the window until relevant.
- Set the safety envelope (E4). Express intent in policy (permissions), contain failure in mechanism (sandbox, out-of-band reversibility).
- Engineer the window against rot (C1 → C2). Know the four failure modes; assemble a stable, well-placed, just-in-time window; compact or checkpoint as it fills.
- Persist deliberately (C3). Commit the durable and reviewable; leave the disposable to (typed, decaying) memory — and remember recalled memory is just more context.
Decision points
The recurring trade-offs, and how the book resolves them:
- Signal vs. budget. Add context to help, or subtract to protect the window? Default to subtract: legibility and examples beat prose, and the one measured result says unnecessary context-file content reduces success. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? · Gloaguen, Mündler, Müller, Raychev, Vechev (ETH Zurich) (2026)T3-practitioner original Add only broadly-applicable, can’t-infer context to the always-on layer; everything else loads on demand.
- Stability vs. freshness. A stable prefix is a large cost/latency lever, but content changes. Resolve by placement: stable front, volatile tail.
- Where a fact lives. Always-on (CLAUDE.md), on-demand (skill), or remembered (memory)? Fact that applies broadly → instruction layer; procedure → skill; durable + reviewable → committed doc; fast + private + disposable → memory.
- Placement under rot. Load-bearing content goes at an edge, not the middle; Lost in the Middle: How Language Models Use Long Contexts · Liu et al. (TACL) (2023)T3-practitioner original decompose multi-hop reasoning rather than stuff the window.
- Ergonomics vs. enforcement. Skills and filters shape what the model sees; they are not security. Real restriction lives in the permission/sandbox layer.
An honest map of the evidence
The book’s claims sit at very different evidence tiers, and designing well means weighting them accordingly.
- Measured (rare). One controlled result anchors the instruction layer: unnecessary context-file content reduces success and adds cost. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? · Gloaguen, Mündler, Müller, Raychev, Vechev (ETH Zurich) (2026)T3-practitioner original Treat as one study, not law.
- Converged (strong). The CLAUDE.md short-and-hand-curated rule, the U-shaped positional curve, navigate-before-read at scale, and the doc-vs-memory boundary each have independent corroboration — the strongest signal a craft discipline offers.
- First-party-only (authoritative, uncorroborated). Skills mechanics are entirely Anthropic-sourced — authoritative on what they are, not yet independent evidence of efficacy.
- Openly unsolved. Memory is scaffolding around a contested space, State of AI Agent Memory 2026: Benchmarks, Architectures & Production Gaps · Mem0 Engineering Team (2026)T3-practitioner original and whether context rot is permanent or trainable is the live 2026 front. Build for what exists; don’t bet the architecture on either resolving.
The boundary of this volume
This book engineers two of the harness’s layers — the environment the agent acts in and the context it reasons over. It stops, deliberately, at control flow: how an agent critiques and retries its own work — reflection, or self-correction — and how multiple agents are coordinated are the companion D1 orchestration volume’s subject, not this one’s. What this volume owns of reflection is only its footprint — the environment a critic step reads, and the context its critique writes back, a cost the rot, assembly, and memory chapters each flag where it lands.
Quick reference
- One system: environment makes signal available + checkable; context decides what crosses + persists.
- Workflow: legible environment → budget the always-on → push procedures on-demand → set the safety envelope → engineer the window vs rot → persist deliberately.
- The locating question: paid every turn, or only when relevant?
- Default to subtract in the always-on layer (the one measured result).
- Weight the evidence: measured (rare) → converged (strong) → first-party (uncorroborated) → unsolved (scaffold).
Practice
Exercise solutions
A representative pass: (1) legible environment — add an entry-point map + examples (E1); (2) budget — cut the CLAUDE.md to broadly-true facts (E2, grounded in the ETH result); (3) on-demand — move the release procedure to a Skill (E3); (4) safety — deny prod writes, sandbox the rest (E4); (5) window — place the task spec at an edge, load files JIT, checkpoint long runs (C1/C2); (6) persist — commit the project constitution, let auto-memory hold session preferences (C3). The weakest-evidence spots are usually the Skill efficacy (first-party-only) and any memory layer (unsolved) — keep both reversible: skills are easy to remove, and durable facts live in the committed doc you control, so a memory failure degrades gracefully.
Example — “loses the thread across sessions”: it’s a context failure that spans chapters. Diagnose with C1 (the window doesn’t carry prior state) and C3 (nothing durable persisted it). Resolve via the decision points: where a fact lives (the durable project state belongs in a committed doc, not ephemeral memory) and engineer the window (checkpoint-and-restart from a progress file rather than relying on a giant carried-over transcript). Fix as a sequence: (env) add a progress/notes file the agent reads on start; (context) checkpoint at session end and restart from the file with a small stable prefix; (persist) commit the durable identity/constraints so they’re reloaded deterministically. The failure wasn’t one chapter’s — it was a path from rot (C1) through assembly (C2) to memory (C3), which is exactly how the book is meant to be used.