Most real agent work is several steps, not one. The question this chapter answers is not what the steps are but who enforces the order — your application code, or the model itself — and how you keep a bad step from poisoning the next one. That choice is an Evaluate-level judgment: the exam gives you a workflow and asks where the control flow belongs, how the handoff is specified, and where the gate goes.

Do I know this already? Diagnostic

Answer these confidently and you can skim ahead to Exam essentials; if any is shaky, read closely — each is developed below.

Name the two places a multi-step workflow’s control flow can live, and one property each buys.
Why is every step boundary a place fidelity can leak?
In the Writer/Reviewer pattern, why must the reviewer not inherit the writer’s context?
What three things does a handoff contract specify, and what file can carry it across a boundary?
A programmatic pipeline produces a malformed step-2 output. What does a validation gate do, and what are its two kinds of check?

Check your answers

In your code (programmatic) — which buys determinism and auditability — or in the model (prompt-based) — which buys flexibility and adaptivity.
Each boundary is a handoff — step N’s output becomes step N+1’s input — and a re-narrated handoff erodes detail at every retelling: the telephone game.
Because the absence of inheritance is the feature — a fresh context isn’t biased toward code it just wrote and cannot rationalize choices it never made.
An objective, an output format, and clear task boundaries — carried across the boundary as a written artifact the next step reads, such as a spec file or a test file.
It rejects and retries — re-prompting the failing step with the specific errors instead of passing the bad output downstream — using a schema/structural check (right shape) and a semantic check (content actually right).

Two places to enforce a workflow

A multi-step workflow’s control flow lives in exactly one of two places. Either your code drives the sequence — run a step, take its output, decide the next call — or the model drives it, having been told the steps in a prompt. The Agent SDK frames the split directly: “With the Client SDK, you implement a tool loop. With the Agent SDK, Claude handles it.” [Official] Agent SDK overview · AnthropicT1-official original

These are not rival products — Anthropic notes the same workflow “translate[s] directly” between the CLI and the SDK. [Official] Agent SDK overview · AnthropicT1-official original The architect’s decision is which layer holds the control flow, and it turns on how much the workflow needs determinism versus flexibility.

Every step boundary is a handoff

Whichever layer enforces the steps, each transition between them is a handoff — the output of step N becomes the input of step N+1 — and a handoff is where information is lost. Chapter D1.2 named the worst case: dividing a tightly-coupled task by role (planner → implementer → tester → reviewer) “creates constant coordination overhead and context loss at handoffs — the telephone game,” spending more tokens coordinating than executing. [Official] Building multi-agent systems: When and how to use them · Anthropic (2026)T1-official original Prompt-based handoffs across a long sequential chain are the most fidelity-fragile arrangement: each step re-narrates the last, and detail erodes at every retelling.

The Writer/Reviewer handoff that works

Not every handoff leaks — the canonical multi-step quality workflow depends on one. In the Writer/Reviewer pattern, one session implements and a second reviews: “A fresh context improves code review since Claude won’t be biased toward code it just wrote.” [Official] Best practices for Claude Code · AnthropicT1-official original Session A writes the rate limiter; Session B reviews the file for edge cases, race conditions, and consistency; Session A then addresses the feedback. The same shape works for tests — “have one Claude write tests, then another write code to pass them.” [Official] Best practices for Claude Code · AnthropicT1-official original

The handoff contract and its artifact

When work does cross a boundary, what crosses must be specified, not assumed. Anthropic’s research system makes each handoff an explicit contract: “Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries.” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original That is the same discipline as D1.3’s rule that everything the subagent needs goes in the prompt — applied to every step of a multi-step flow.

The most robust way to carry that contract across a boundary is as a written artifact — a file the next step reads, not prose it re-narrates. Two concrete forms appear in the best-practices guidance:

A spec file. After an interview/planning phase, “start a fresh session to execute it … and you have a written spec to reference.” [Official] Best practices for Claude Code · AnthropicT1-official original The spec, not the conversation, is what crosses to the implementation step.
A test file. In the test/code split, the tests are the contract: one Claude writes them, another writes code to pass them. The implementer’s target is the file, not a description of it.

The validation gate: reject before propagating

A precise contract is also what lets a programmatic pipeline put a gate between steps — a check the step-N output must pass before step N+1 is allowed to consume it. The gate does two kinds of check, and the distinction is the one Domain 4 builds on (D4.4):

On failure, the gate does not pass the bad output downstream — it rejects and retries: re-prompt the failing step with the specific errors, and only advance when the output passes. That is the difference between a programmatic pipeline and a prompt-based one: the gate is enforced in your code, where a malformed step cannot quietly become the next step’s input.

A gated content pipeline Worked example

A programmatic flow: research → draft → [gate] → publish.

Research runs; its notes are written to research.md (the artifact crossing to draft).
Draft produces an article keyed to a contract: { title, sections[≥3], every claim cites a research.md line }.
The gate (your code, not the model) runs two checks on the draft:
- Schema: does it have a title and ≥3 sections, and does every claim carry a citation marker? (parse check)
- Semantic: does each cited line actually exist in research.md? (a fabricated citation passes the schema but fails here)
On failure — say a claim cites a non-existent line — the gate rejects the draft and re-prompts the draft step: “Claim 4 cites research.md:88, which does not exist. Re-cite from real lines.” It loops until the draft passes, then lets publish consume it.

Without the gate, the fabricated citation flows straight into publish — a silent failure caught only by a reader. The gate is the programmatic analogue of D1.1’s rule that a failed step is a result to handle, not something to wave through.

Choosing where the control flow lives

The Evaluate-level call: enforce programmatically when the workflow needs determinism, an audit trail, validation gates between steps, or a fixed and repeatable sequence — the steps are known in advance and you want them to run the same way every time. Stay prompt-based when the path is flexible, the model can sensibly self-direct, and the orchestration code would cost more than it saves.

This pairs with two neighboring decisions: whether to split into multiple agents at all (D1.2) and whether the decomposition is a fixed pipeline or an adaptive one (D1.6). Enforcement locus is how the steps are driven; those chapters cover whether and into what shape. The mechanics of the validation/retry loop itself — schema vs semantic errors, bounded retries — are developed in D4.4.

Practice

Exercise solutions

Solution ↑ Exercise

Choose (b), but split at one boundary only. The fact-check is failing for the exact reason the Writer/Reviewer pattern addresses — a context biased toward the draft it just produced rationalizes its own claims. Hand the fact-check to a fresh context (a second session or a verification subagent), passing an explicit handoff contract: the draft, the claims to verify, the success criteria, the output format. That is a programmatic handoff — your code routes the draft to the reviewer and the verdict back. Keep research → draft coupled in one context: they are tightly coupled and share state, so a handoff there would only leak fidelity. And do not split all four steps into role-agents — that is the telephone-game pipeline, four lossy handoffs where you needed one. The skill is placing the single split where fresh context buys independence.

Solution ↑ Exercise

Programmatic enforcement puts the control flow in your code — you sequence the steps, pass each output to the next, and can gate between them; choose it when you need determinism, an audit trail, validation gates, or a fixed repeatable sequence. Prompt-based enforcement puts the control flow in the model — it is told the procedure and self-directs; choose it when the path is flexible, the model can sensibly adapt, and orchestration code would cost more than it saves.

Solution ↑ Exercise

The schema is doing a structural check — right shape, fields present, types valid — and is missing the semantic check: whether the content is actually correct (valid JSON can still carry fabricated or contradictory data). The gate should not pass a failing output along; it should reject and retry — re-prompt the failing step with the specific error and only advance once the output passes both the structural and semantic checks.

Exam essentials

Two enforcement loci: a multi-step workflow’s control flow lives in your code (programmatic — you sequence steps and gate between them; deterministic/auditable) or the model (prompt-based — told the steps, self-directs; flexible).
Every step boundary is a handoff, and handoffs leak. A sequential chain of role-agents is the most fidelity-fragile arrangement — the telephone game.
Writer/Reviewer is the handoff that works because the reviewer has fresh context — it can’t defend code it never wrote. Don’t let an author review its own work.
Carry the contract in a written artifact — a spec file or a test file the next step reads from disk — so the handoff doesn’t depend on re-narration.
A programmatic validation gate runs a structural check and a semantic check between steps, and on failure rejects and retries rather than propagating the bad output. (The loop in depth: D4.4.)
Choose programmatic for determinism / audit / gates / fixed sequence; prompt-based for flexible self-direction. (Whether to split = D1.2; pipeline vs adaptive = D1.6.)