Part 1 Chapter 11 Last verified 2026-05-29 Fresh

Designing the Whole: Environment + Context as One System

The capstone — an integrative design workflow that composes the book's eight core chapters into one discipline, with decision points and an honest map of what is settled, converged, first-party-only, and openly unsolved.

Volatility: architectural-pattern

Tools compared: claude-codecross-tool

On this page

The two layers are one system
A design workflow
Decision points
An honest map of the evidence
The boundary of this volume
Quick reference
Practice

This chapter is integrative. It introduces no new evidence — it composes the book’s grounded claims into a design workflow and a decision guide. Where it restates a load-bearing fact, it points back to the chapter that established it; the rest is synthesis.

The two layers are one system

The book opened on a thesis: what turns a model into an agent is the engineering of the two layers around it — the environment it acts in and the context it reasons over — and that discipline is the most underappreciated, highest-leverage thing an architect designs. Eight chapters in, the payoff is that they are not eight topics but two ends of one loop: the environment is the durable store of everything the agent could use; the context is the finite slice it does use each turn; and the harness owns the boundary between them — context being “a finite resource with diminishing marginal returns.” [Official] Effective context engineering for AI agents · Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield (2025)T1-official original

A design workflow

The chapters fall into a natural order when you design a real agent’s environment and context together.

Make the environment legible (E1, E5). Maximize signal in and machine-checkable feedback out; at scale, bound what must be loaded (interface contracts, shallow index, scope-to-workspace).
Budget the always-on layer (E2). The instruction file is paid every turn — spend it only on broadly-applicable, can’t-infer-from-code context. More is not better. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? · Gloaguen, Mündler, Müller, Raychev, Vechev (ETH Zurich) (2026)T3-practitioner original
Push procedures to load on demand (E3). Skills are just-in-time procedural knowledge; keep them out of the window until relevant.
Set the safety envelope (E4). Express intent in policy (permissions), contain failure in mechanism (sandbox, out-of-band reversibility).
Engineer the window against rot (C1 → C2). Know the four failure modes; assemble a stable, well-placed, just-in-time window; compact or checkpoint as it fills.
Persist deliberately (C3). Commit the durable and reviewable; leave the disposable to (typed, decaying) memory — and remember recalled memory is just more context.

Decision points

The recurring trade-offs, and how the book resolves them:

Signal vs. budget. Add context to help, or subtract to protect the window? Default to subtract: legibility and examples beat prose, and the one measured result says unnecessary context-file content reduces success. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? · Gloaguen, Mündler, Müller, Raychev, Vechev (ETH Zurich) (2026)T3-practitioner original Add only broadly-applicable, can’t-infer context to the always-on layer; everything else loads on demand.
Stability vs. freshness. A stable prefix is a large cost/latency lever, but content changes. Resolve by placement: stable front, volatile tail.
Where a fact lives. Always-on (CLAUDE.md), on-demand (skill), or remembered (memory)? Fact that applies broadly → instruction layer; procedure → skill; durable + reviewable → committed doc; fast + private + disposable → memory.
Placement under rot. Load-bearing content goes at an edge, not the middle; Lost in the Middle: How Language Models Use Long Contexts · Liu et al. (TACL) (2023)T3-practitioner original decompose multi-hop reasoning rather than stuff the window.
Ergonomics vs. enforcement. Skills and filters shape what the model sees; they are not security. Real restriction lives in the permission/sandbox layer.

An honest map of the evidence

The book’s claims sit at very different evidence tiers, and designing well means weighting them accordingly.

Measured (rare). One controlled result anchors the instruction layer: unnecessary context-file content reduces success and adds cost. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? · Gloaguen, Mündler, Müller, Raychev, Vechev (ETH Zurich) (2026)T3-practitioner original Treat as one study, not law.
Converged (strong). The CLAUDE.md short-and-hand-curated rule, the U-shaped positional curve, navigate-before-read at scale, and the doc-vs-memory boundary each have independent corroboration — the strongest signal a craft discipline offers.
First-party-only (authoritative, uncorroborated). Skills mechanics are entirely Anthropic-sourced — authoritative on what they are, not yet independent evidence of efficacy.
Openly unsolved. Memory is scaffolding around a contested space, State of AI Agent Memory 2026: Benchmarks, Architectures & Production Gaps · Mem0 Engineering Team (2026)T3-practitioner original and whether context rot is permanent or trainable is the live 2026 front. Build for what exists; don’t bet the architecture on either resolving.

The boundary of this volume

This book engineers two of the harness’s layers — the environment the agent acts in and the context it reasons over. It stops, deliberately, at control flow: how an agent critiques and retries its own work — reflection, or self-correction — and how multiple agents are coordinated are the companion D1 orchestration volume’s subject, not this one’s. What this volume owns of reflection is only its footprint — the environment a critic step reads, and the context its critique writes back, a cost the rot, assembly, and memory chapters each flag where it lands.

Quick reference

One system: environment makes signal available + checkable; context decides what crosses + persists.
Workflow: legible environment → budget the always-on → push procedures on-demand → set the safety envelope → engineer the window vs rot → persist deliberately.
The locating question: paid every turn, or only when relevant?
Default to subtract in the always-on layer (the one measured result).
Weight the evidence: measured (rare) → converged (strong) → first-party (uncorroborated) → unsolved (scaffold).

Practice

Exercise solutions

Solution ↑ Exercise

A representative pass: (1) legible environment — add an entry-point map + examples (E1); (2) budget — cut the CLAUDE.md to broadly-true facts (E2, grounded in the ETH result); (3) on-demand — move the release procedure to a Skill (E3); (4) safety — deny prod writes, sandbox the rest (E4); (5) window — place the task spec at an edge, load files JIT, checkpoint long runs (C1/C2); (6) persist — commit the project constitution, let auto-memory hold session preferences (C3). The weakest-evidence spots are usually the Skill efficacy (first-party-only) and any memory layer (unsolved) — keep both reversible: skills are easy to remove, and durable facts live in the committed doc you control, so a memory failure degrades gracefully.

Solution ↑ Exercise

Example — “loses the thread across sessions”: it’s a context failure that spans chapters. Diagnose with C1 (the window doesn’t carry prior state) and C3 (nothing durable persisted it). Resolve via the decision points: where a fact lives (the durable project state belongs in a committed doc, not ephemeral memory) and engineer the window (checkpoint-and-restart from a progress file rather than relying on a giant carried-over transcript). Fix as a sequence: (env) add a progress/notes file the agent reads on start; (context) checkpoint at session end and restart from the file with a small stable prefix; (persist) commit the durable identity/constraints so they’re reloaded deterministically. The failure wasn’t one chapter’s — it was a path from rot (C1) through assembly (C2) to memory (C3), which is exactly how the book is meant to be used.