Part 2 Chapter 19 Last verified 2026-06-14 Fresh

Multi-Agent: Coordinating Many

Coordinating many agents as one decision chain — topology, then coordinator, then verifier, then a cost gate. Orchestrator-worker and the centralized-to-decentralized axis; the decompose-delegate-aggregate loop two independent first-party posts describe; the in-orchestration verifier; and the genuinely open, unflattened question of when multi-agent is worth its cost — Anthropic ships it, Cognition argues against it, and they share the parallelizability test.

Volatility: architectural-pattern

Tools compared: claude-codecross-tool

On this page

One decision chain
Orchestrator-worker, and the centralized↔decentralized axis
The coordinator: decompose → delegate → aggregate
The verifier: separating generation from review
The cost gate — and a genuinely open question
When is multi-agent worth it? A live, open question
Patterns
Quick reference
Practice

The sub-agent chapter gave you the unit: a fresh, isolated window. This chapter coordinates many of them. The temptation is to treat “multi-agent” as a capability tier — more agents, more power — but it is better read as one decision chain: choose a topology, implement a coordinator, add a verifier, and gate the whole thing on cost. The last gate is the one that matters most, and it is where the field genuinely disagrees — so this chapter ends not with a verdict but with an honest, dated map of an open question.

One decision chain

Multi-agent design looks like four separate topics — topologies, coordination, verification, cost — but they are four sequential moves in one decision. You pick a topology (how the agents are arranged and who directs them), implement the coordinator (how the lead decomposes and recombines), add a verifier (how worker output is checked), and gate the whole thing on cost (whether the work is parallelizable enough to be worth it). Reading the chapter in order is reading the decision in order.

Orchestrator-worker, and the centralized↔decentralized axis

The canonical shape is orchestrator-worker. Anthropic’s research system “uses a multi-agent architecture with an orchestrator-worker pattern, where a lead agent coordinates the process while delegating to specialized subagents that operate in parallel.” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original On a query, “the lead agent analyzes it, develops a strategy, and spawns subagents to explore different aspects simultaneously.” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original

Framework vocabulary names the two ends of an axis. The centralized end is the supervisor: “The supervisor controls all communication flow and task delegation, making decisions about which agent to invoke based on the current context and task requirements.” [Official] LangGraph Multi-Agent Supervisor · LangChain (langchain-ai)T2-release-notes original The decentralized end is the swarm, “where agents dynamically hand off control to one another based on their specializations” [Official] LangGraph Multi-Agent Swarm · LangChain (langchain-ai)T2-release-notes original with no central coordinator.

The coordinator: decompose → delegate → aggregate

Inside the centralized shape, the lead runs one reusable loop. It decomposes the query — “the lead agent decomposes queries into subtasks and describes them to subagents,” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original where each brief carries an objective, an output format, tool guidance, and clear boundaries. It delegates those briefs to workers running in parallel. And it aggregates their results.

That this is a pattern and not one team’s idiom is the strongest evidence in the chapter, because two independent first-party posts describe the same loop.

The convergence is what licenses treating the loop as the reusable coordinator pattern rather than a single system’s design choice.

The verifier: separating generation from review

A coordinator that only generates is incomplete; the pattern’s natural complement is a verifier — a dedicated reviewer, separate from the workers. In practice, Anthropic “used an LLM judge that evaluated each output against criteria in a rubric” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original (factual accuracy, citation accuracy, completeness, source quality, tool efficiency) — LLM-as-judge applied inside the orchestrated system. At the workflow level this is the evaluator-optimizer: “one LLM call generates a response while another provides evaluation and feedback in a loop.” [Official] Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original

This is the same separate-the-reviewer move the sub-agent chapter’s clean-room verifier made, now applied across the orchestrated system: workers generate, a verifier reviews. How to calibrate that judge — its score scale, rubric reliability — is the Operations volume’s evaluation subject, not this chapter’s; here the verifier is a structural role, not a measured instrument.

The cost gate — and a genuinely open question

Now the gate that decides whether any of the above should exist. Multi-agent systems are expensive, and the first-party figure is the one to hold: “agents typically use about 4× more tokens than chat interactions, and multi-agent systems use about 15× more tokens than chats.” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original

Anthropic itself draws a boundary on the same page: tasks that “require all agents to share the same context or involve many dependencies between agents are not a good fit for multi-agent systems today.” [Official] How we built our multi-agent research system · Anthropic (2025)T1-official original So even the camp that ships orchestrator-worker reserves it for fan-out-friendly work.

And here the field does not agree — which is worth presenting honestly rather than resolving.

When is multi-agent worth it? A live, open question

As of mid-2026 this is genuinely unresolved, and the two most-cited positions point opposite ways. The honest move is to lay them side by side, dated, and let the reader weigh them.

	Anthropic	Cognition (Walden Yan)
Stance	Builds and ships orchestrator-worker for a production research system	Argues against multi-agent collaboration for most cases
Recommended default	Orchestrator-worker for fan-out-friendly tasks	A single-threaded linear agent
On the worth-it test	Reserve it for parallelizable work; shared-context/high-dependency tasks are a poor fit	Same boundary, read pessimistically: most real work shares too much context to parallelize cleanly
Provenance (dated)	“How we built our multi-agent research system,” 2025-06-13	”Don’t Build Multi-Agents,” 2025-06; “Multi-Agents: What’s Actually Working,” 2026-04-22

The positions, in each camp’s own words. Cognition argues that multi-agent collaboration is fragile because “the decision-making ends up being too dispersed and context isn’t able to be shared thoroughly enough between the agents,” [Practitioner] Don't Build Multi-Agents · Walden Yan (Cognition)T3-practitioner original and that “the simplest way to follow the principles is to just use a single-threaded linear agent.” [Practitioner] Don't Build Multi-Agents · Walden Yan (Cognition)T3-practitioner original Its 2026 follow-up refines rather than reverses that: “parallel agents make implicit choices about style, edge cases, and code patterns … these decisions often conflicted with each other, leading to fragile products.” [Practitioner] Multi-Agents: What's Actually Working · Walden Yan (Cognition) (2026)T3-practitioner original

Multi-agent as one decision chain. Choose a topology (orchestrator-worker on a centralized supervisor ↔ decentralized swarm axis), implement the coordinator (decompose → delegate → aggregate), add a verifier (an LLM judge separating generation from review), then gate on cost — a single first-party datapoint puts multi-agent at ~15× a chat's tokens, so the work must fan out into genuinely independent subtasks to clear the gate. The gate is where the field disagrees: Anthropic ships orchestrator-worker for fan-out-friendly work; Cognition argues most work is too interdependent and prefers a single-threaded agent.

Walking the chain on a real task Worked example

A team wants a multi-agent system to “modernize our legacy service.” Walk the chain:

Topology. If anything, centralized (orchestrator-worker / supervisor) — a swarm’s dispersed control is harder to verify and cost.
Coordinator. The lead would decompose “modernize” into subtasks and brief workers. How we built our multi-agent research system · Anthropic (2025)T1-official original But notice the briefs: “update the data layer,” “refactor the API,” “migrate the tests” — these are not independent. They share types, contracts, and patterns.
Verifier. You could add an LLM judge over each worker’s diff. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original
Cost gate — and this is where it fails. The subtasks are highly interdependent (shared context, mutual dependencies), exactly the “not a good fit” regime Anthropic names. How we built our multi-agent research system · Anthropic (2025)T1-official original Cognition’s critique bites here too: parallel agents would make conflicting implicit choices about patterns and edge cases. Multi-Agents: What's Actually Working · Walden Yan (Cognition) (2026)T3-practitioner original The work doesn’t pass the parallelizability test, and at ~15× the token cost, How we built our multi-agent research system · Anthropic (2025)T1-official original it isn’t worth it.

The verdict: a single-threaded agent (or sequential sub-agent delegations for the genuinely-isolable bits, per the previous chapter), not a multi-agent system. The chain did its job by sending you back to one agent.

Patterns

Default to orchestrator-worker. Sketch: one lead coordinates parallel workers; reach for a swarm only with cause. When to use: any multi-agent system that clears the cost gate. How we built our multi-agent research system · Anthropic (2025)T1-official original Mechanics: lead analyzes, strategizes, spawns workers; supervisor controls flow + delegation. Remember: centralized is easier to verify and cost than decentralized.

Run the coordinator loop. Sketch: decompose → delegate (focused briefs) → aggregate. When to use: the lead’s core job. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original Mechanics: each brief carries objective, output format, tool guidance, boundaries; the lead synthesizes results. Remember: two independent first-party posts describe this same loop — it is a pattern, not an idiom.

Add a clean-room verifier. Sketch: a dedicated LLM judge reviews worker output against a rubric. When to use: whenever generation should be checked by something other than the generator. How we built our multi-agent research system · Anthropic (2025)T1-official original Mechanics: rubric dimensions (accuracy, completeness, source quality, …); generation and review separated. Remember: judge calibration is the Operations volume’s job; here it’s a structural role.

Gate hard on parallelizability. Sketch: go multi-agent only when the work fans out into independent subtasks. When to use: the go/no-go before building anything. How we built our multi-agent research system · Anthropic (2025)T1-official original Mechanics: shared-context/high-dependency work is a poor fit; multi-agent runs ~15× a chat’s tokens (one first-party datapoint). Remember: most tasks fail this gate — defaulting back to a single agent is the common, correct outcome.

Quick reference

The chain: topology → coordinator → verifier → cost gate.
Topology: orchestrator-worker (= supervisor, centralized) is the default; swarm (decentralized) is the exception. LangGraph Multi-Agent Supervisor · LangChain (langchain-ai)T2-release-notes original LangGraph Multi-Agent Swarm · LangChain (langchain-ai)T2-release-notes original
Coordinator: decompose → delegate → aggregate — described by two independent first-party posts. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original
Verifier: an in-orchestration LLM judge separates generation from review (calibration → Operations volume). How we built our multi-agent research system · Anthropic (2025)T1-official original
Cost: ~15× a chat’s tokens — one first-party datapoint, not a law; don’t generalize. How we built our multi-agent research system · Anthropic (2025)T1-official original
The open question: Anthropic ships orchestrator-worker; Cognition argues for single-threaded; both share the parallelizability test, disagree on the window width — unsettled as of 2026, recheck. Don't Build Multi-Agents · Walden Yan (Cognition)T3-practitioner original

Practice

Exercise solutions

Solution ↑ Exercise

The chain is topology → coordinator → verifier → cost gate. The first three are mechanics (choosing a shape, implementing the decompose-delegate-aggregate loop, adding a reviewer); the cost gate is the go/no-go — it decides whether the system should exist at all. “The task is large” is the wrong trigger because size is not what makes multi-agent pay: a large but interdependent task shares too much context to fan out, so multiple agents multiply tokens (~15× a chat, on the one first-party datapoint) and risk conflicting implicit choices without buying parallel speed. The right trigger is parallelizability — the work must decompose into genuinely independent subtasks (research-style fan-out), which is exactly the regime both camps’ test points to and which most large coding tasks fail.

Solution ↑ Exercise

The shape of a good answer (the verdict matters less than the honest walk). Take a tempting task — “build a new end-to-end feature across our stack.” Topology: if anything, orchestrator-worker (centralized is easier to verify and cost than a swarm). Coordinator: the lead decomposes into “data layer,” “API,” “UI,” “tests” and briefs a worker each. Verifier: an LLM judge over each worker’s diff. Cost gate — and here it fails: those subtasks are not independent — they share types, contracts, and patterns, so they must share context (the “not a good fit” regime), and parallel workers would make conflicting implicit choices about those shared patterns. The work fails the parallelizability test, and at ~15× a chat’s tokens (one first-party datapoint, not a law to generalize) the spend is not bought back by parallel speed. Verdict: no-go — a single-threaded agent, or sequential sub-agent delegations for the genuinely isolable bits (a self-contained migration script, a doc-generation pass), not a multi-agent system. A task that would pass the gate: “survey ten unrelated libraries and summarize each” — genuinely fan-out, no shared context, the rare go. The exercise’s point is that the honest walk usually ends in no-go, and that the gate — not task size — is what decides.

Solution ↑ Exercise

A fair statement: Anthropic (multi-agent-research, 2025-06-13) builds and ships orchestrator-worker for a production research system, and reserves it for fan-out-friendly work — it explicitly says shared-context/high-dependency tasks are a poor fit. Cognition (Walden Yan: “Don’t Build Multi-Agents,” 2025-06; “Multi-Agents: What’s Actually Working,” 2026-04-22) argues that multi-agent collaboration is fragile because decision-making is too dispersed and context can’t be shared thoroughly enough, and defaults to a single-threaded linear agent. They agree on the underlying test — multi-agent is worth it only when work fans out into independent subtasks; they disagree on how much real work passes that test (Anthropic finds enough in research; Cognition finds most coding too interdependent). A responsible architect treats it as reversible and re-checkable because the question is empirically open and moving (the Cognition follow-up is from 2026-04-22), so betting the architecture permanently on either camp — rather than designing for the work in front of you and re-checking — would be flattening a live disagreement into a false certainty.