Repo & Doc Design for Agents
The first environment chapter — the repository is the substrate a coding agent operates in. Design it to maximize the signal the agent reads and the machine-checkable feedback it gets back. Five converged-craft moves, with their evidence tiers stated honestly.
On this page
The previous chapter argued that the environment and the context are where the discipline lives. This chapter takes the first layer — the environment — at its most concrete: the repository the agent reads and acts in. Every move here is converged craft, not measured effect; the convergence across independent practitioners is the signal, and stating that honestly is part of the chapter’s job.
The repository is the environment
A coding agent does not see your project the way you do. It sees what the harness loads into its context — and most of that is your repository: the files, their names, the docs, the tests. So the repository is not just where the work happens; it is the environment the agent operates in, and its structure is, in effect, the prompt.
The practitioner premise is blunt: the tokens you put in the model’s context “are the ONLY lever you have to affect the quality of your output.” [Practitioner] Advanced Context Engineering for Coding Agents (ACE-FCA) · Dex Horthy (HumanLayer) (2025)T3-practitioner original If context is the only lever, then how the repository is structured — what an agent reads when it lands cold — is a primary determinant of output quality.
The first move follows directly: give the agent a predictable entry point. The cross-tool AGENTS.md convention exists for exactly this — “a dedicated, predictable place to provide the context and instructions to help AI coding agents work on your project,” AGENTS.md · Agentic AI Foundation (Linux Foundation)T2-release-notes original a README written for agents rather than humans.
Two halves: shaping the input, shaping the feedback
The five moves in this chapter are not five unrelated tips. They split cleanly into two halves of one discipline.
- Shaping the input — legibility (a readable entry-point map), examples-as-constraints (show, don’t tell), and negative space (subtract first) all govern what the agent reads.
- Shaping the feedback — failure breadcrumbs (durable records of past mistakes) and structural fitness (deterministic sensors) govern what the agent gets back after it acts.
Legibility and structural fitness are one property
The two halves meet in a single idea. A repository is legible to an agent to exactly the degree its structure is machine-checkable. Böckeler’s definition of a harnessable environment is the “structural properties of the environment itself that make it legible, navigable, and tractable to agents,” [Practitioner] Harness engineering for coding agent users · Birgitta Böckeler (2026)T3-practitioner original and she notes that “clearly definable module boundaries afford architectural constraint rules.” Harness engineering for coding agent users · Birgitta Böckeler (2026)T3-practitioner original
So the structure that makes a repo navigable (a human-facing, legibility reading) is the same structure that makes it enforceable (a machine-facing, fitness reading). The entry-point map is the readable face; the sensor suite is the enforced face; they are one property, not two.
Show, don’t tell — and subtract first
Two input-shaping moves turn out to be the same instruction. Anthropic’s official guidance is to “reference specific files, mention constraints, and point to example patterns,” [Official] Best practices for Claude Code · AnthropicT1-official original illustrated with a worked prompt — “HotDogWidget.php is a good example. follow the pattern to implement a new calendar widget.” Best practices for Claude Code · AnthropicT1-official original A reference implementation constrains output more reliably than a paragraph of prose rules.
The complementary move is negative space: deliberately curating what the agent reads instead of over-documenting. Context engineering is “deliberately structuring how you feed context to the AI,” [Practitioner] Advanced Context Engineering for Coding Agents (ACE-FCA) · Dex Horthy (HumanLayer) (2025)T3-practitioner original to the point of “designing your ENTIRE WORKFLOW around context management.” Advanced Context Engineering for Coding Agents (ACE-FCA) · Dex Horthy (HumanLayer) (2025)T3-practitioner original
For a context-bounded agent these are one instruction: a worked example is simultaneously more constraining and cheaper in tokens than a prose rule — so the cleanest way to subtract prose is to point at an example.
The ratchet: every failure becomes an affordance
The feedback half is where the discipline compounds. The practice Hashimoto names is to treat each agent mistake as permanent: “anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.” [Practitioner] My AI Adoption Journey · Mitchell Hashimoto (2026)T3-practitioner original The concrete artifact is an instructions file where “each line… is based on a bad agent behavior” — which he reports “almost completely resolved them all.” My AI Adoption Journey · Mitchell Hashimoto (2026)T3-practitioner original
Decision records do the same for why: ADRs give “enough structure to ensure key points are addressed, but in natural language,” [Practitioner] Using Architecture Decision Records (ADRs) with AI coding assistants · Chris Swan (2025)T3-practitioner original so an agent recovers the rationale behind a choice instead of re-deriving or contradicting it.
And structural sensors close the loop automatically: deterministic checks “cheap and fast enough to run on every change, alongside the agent,” [Practitioner] Harness engineering for coding agent users · Birgitta Böckeler (2026)T3-practitioner original that “catch the structural stuff reliably: duplicate code, cyclomatic complexity, missing test coverage, architectural drift.” Harness engineering for coding agent users · Birgitta Böckeler (2026)T3-practitioner original Böckeler observed an agent that “violated the rules a handful of times… and then self-corrected” from sensor feedback. Maintainability sensors for coding agents · Birgitta Böckeler (2026)T3-practitioner original
What is still settling
Three honest limits travel with this chapter.
- No effect sizes exist. Every move here is converged craft — observed practice agreed on by independent practitioners — not a controlled study. There is no measured “examples cut errors by X%.” Treat the direction as well-supported; do not generalize any number.
- The strongest recovery evidence is n=1. Hashimoto’s “resolved them all” and Böckeler’s self-correction observation are first-person field reports, author-is-subject — directionally supportive, no statistical weight.
- More context files is not automatically better. There is one measured result adjacent to this material, and it cuts the other way — repository context files can reduce task success and add cost. That study, and what it implies for the instruction layer, is the next chapter; flagged here so “add more docs” is not read as the lesson.
Patterns
The five moves, in the reference template. Each is a converged-craft practice; apply the ones your repo lacks.
Entry-point map (AGENTS.md). Sketch: one predictable, agent-addressable file at the repo root. When to use: always — it is the agent’s cold-start map. AGENTS.md · Agentic AI Foundation (Linux Foundation)T2-release-notes original Mechanics: place AGENTS.md at root; point to where things live, not every detail. Remember: it is a map, not the whole manual — keep depth in linked files.
Examples as constraints. Sketch: point at a reference implementation instead of writing a prose rule. When to use: whenever a convention has an existing instance. Best practices for Claude Code · AnthropicT1-official original Mechanics: “follow the pattern in X”; cite the file, name the constraint. Remember: a worked example is more constraining and cheaper in tokens than prose.
Negative space. Sketch: deliberately prune what the agent reads. When to use: when docs have grown faster than they’re curated. Advanced Context Engineering for Coding Agents (ACE-FCA) · Dex Horthy (HumanLayer) (2025)T3-practitioner original Mechanics: design the workflow around what to omit; subtract before adding. Remember: this is a design choice about omission, separate from context-rot evidence (later chapter).
Failure breadcrumbs. Sketch: turn each observed mistake into a durable record. When to use: any recurring agent error. My AI Adoption Journey · Mitchell Hashimoto (2026)T3-practitioner original Mechanics: one instructions-file line per prevented behavior; ADRs for decisions. Using Architecture Decision Records (ADRs) with AI coding assistants · Chris Swan (2025)T3-practitioner original Remember: a repo affordance the agent reads — not runtime telemetry.
Structural sensors. Sketch: deterministic checks that run on every change. When to use: wherever structure is machine-checkable (types, boundaries, coverage). Harness engineering for coding agent users · Birgitta Böckeler (2026)T3-practitioner original Mechanics: wire tests/linters/architectural rules into the loop; let the agent self-correct from them. Maintainability sensors for coding agents · Birgitta Böckeler (2026)T3-practitioner original Remember: sensors catch structural issues reliably — not correctness or over-engineering.
Quick reference
- The repo is the environment — its structure is the prompt the agent reads.
- One principle: maximize signal in, maximize machine-checkable feedback out.
- Input half: entry-point map · examples-as-constraints · negative space.
- Feedback half: failure breadcrumbs · structural sensors.
- Legibility = fitness: the structure that makes a repo navigable is what makes it enforceable.
- Evidence: converged craft, no effect sizes; the strongest recovery evidence is n=1.
Practice
Exercise solutions
A typical answer: entry-point map present (an AGENTS.md or CLAUDE.md exists), examples-as-constraints partial (some conventions documented, few pointed-to), negative space absent (docs accreted, never pruned), failure breadcrumbs absent (mistakes re-explained each session), structural sensors partial (tests exist but aren’t wired as agent-facing feedback). The common weak half is feedback — teams document for the agent but don’t give it machine-checkable signal after it acts. Strengthening the feedback half (sensors + breadcrumbs) is usually the higher-leverage move precisely because it is the neglected one, and because it compounds: each addition makes the next session’s environment better.