Most real agent work is several steps, not one. The question this chapter answers is not what the steps are but who enforces the order — your application code, or the model itself — and how you keep a bad step from poisoning the next one. That choice is an Evaluate-level judgment: the exam gives you a workflow and asks where the control flow belongs, how the handoff is specified, and where the gate goes.

Two places to enforce a workflow

A multi-step workflow’s control flow lives in exactly one of two places. Either your code drives the sequence — run a step, take its output, decide the next call — or the model drives it, having been told the steps in a prompt. The Agent SDK frames the split directly: “With the Client SDK, you implement a tool loop. With the Agent SDK, Claude handles it.” [Official] Agent SDK overview · AnthropicT1-official original

These are not rival products — Anthropic notes the same workflow “translate[s] directly” between the CLI and the SDK. [Official] Agent SDK overview · AnthropicT1-official original The architect’s decision is which layer holds the control flow, and it turns on how much the workflow needs determinism versus flexibility.

Every step boundary is a handoff

Whichever layer enforces the steps, each transition between them is a handoff — the output of step N becomes the input of step N+1 — and a handoff is where information is lost. Chapter D1.2 named the worst case: dividing a tightly-coupled task by role (planner → implementer → tester → reviewer) “creates constant coordination overhead and context loss at handoffs — the telephone game,” spending more tokens coordinating than executing. [Official] Building multi-agent systems: When and how to use them · Anthropic (2026)T1-official original Prompt-based handoffs across a long sequential chain are the most fidelity-fragile arrangement: each step re-narrates the last, and detail erodes at every retelling.

The Writer/Reviewer handoff that works

Not every handoff leaks — the canonical multi-step quality workflow depends on one. In the Writer/Reviewer pattern, one session implements and a second reviews: “A fresh context improves code review since Claude won’t be biased toward code it just wrote.” [Official] Best practices for Claude Code · AnthropicT1-official original Session A writes the rate limiter; Session B reviews the file for edge cases, race conditions, and consistency; Session A then addresses the feedback. The same shape works for tests — “have one Claude write tests, then another write code to pass them.” [Official] Best practices for Claude Code · AnthropicT1-official original

The handoff contract and its artifact

When work does cross a boundary, what crosses must be specified, not assumed. Anthropic’s research system makes each handoff an explicit contract: “Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries.” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original That is the same discipline as D1.3’s rule that everything the subagent needs goes in the prompt — applied to every step of a multi-step flow.

The most robust way to carry that contract across a boundary is as a written artifact — a file the next step reads, not prose it re-narrates. Two concrete forms appear in the best-practices guidance:

The validation gate: reject before propagating

A precise contract is also what lets a programmatic pipeline put a gate between steps — a check the step-N output must pass before step N+1 is allowed to consume it. The gate does two kinds of check, and the distinction is the one Domain 4 builds on (D4.4):

On failure, the gate does not pass the bad output downstream — it rejects and retries: re-prompt the failing step with the specific errors, and only advance when the output passes. That is the difference between a programmatic pipeline and a prompt-based one: the gate is enforced in your code, where a malformed step cannot quietly become the next step’s input.

Choosing where the control flow lives

The Evaluate-level call: enforce programmatically when the workflow needs determinism, an audit trail, validation gates between steps, or a fixed and repeatable sequence — the steps are known in advance and you want them to run the same way every time. Stay prompt-based when the path is flexible, the model can sensibly self-direct, and the orchestration code would cost more than it saves.

This pairs with two neighboring decisions: whether to split into multiple agents at all (D1.2) and whether the decomposition is a fixed pipeline or an adaptive one (D1.6). Enforcement locus is how the steps are driven; those chapters cover whether and into what shape. The mechanics of the validation/retry loop itself — schema vs semantic errors, bounded retries — are developed in D4.4.

Practice

Exercise solutions

Solution ↑ Exercise

Choose (b), but split at one boundary only. The fact-check is failing for the exact reason the Writer/Reviewer pattern addresses — a context biased toward the draft it just produced rationalizes its own claims. Hand the fact-check to a fresh context (a second session or a verification subagent), passing an explicit handoff contract: the draft, the claims to verify, the success criteria, the output format. That is a programmatic handoff — your code routes the draft to the reviewer and the verdict back. Keep research → draft coupled in one context: they are tightly coupled and share state, so a handoff there would only leak fidelity. And do not split all four steps into role-agents — that is the telephone-game pipeline, four lossy handoffs where you needed one. The skill is placing the single split where fresh context buys independence.

Solution ↑ Exercise

Programmatic enforcement puts the control flow in your code — you sequence the steps, pass each output to the next, and can gate between them; choose it when you need determinism, an audit trail, validation gates, or a fixed repeatable sequence. Prompt-based enforcement puts the control flow in the model — it is told the procedure and self-directs; choose it when the path is flexible, the model can sensibly adapt, and orchestration code would cost more than it saves.

Solution ↑ Exercise

The schema is doing a structural check — right shape, fields present, types valid — and is missing the semantic check: whether the content is actually correct (valid JSON can still carry fabricated or contradictory data). The gate should not pass a failing output along; it should reject and retry — re-prompt the failing step with the specific error and only advance once the output passes both the structural and semantic checks.

Exam essentials