Build vs. Buy: Choosing a Harness
The first move on the capability axis — start direct on the API and add a harness abstraction only when it earns its keep. Why a framework's convenience is bought with abstraction that obscures prompts and is harder to debug, why a custom harness is a standing maintenance liability as models improve, what the framework landscape offers per each vendor's own docs, and why the realistic answer is the configure-wrap-extend middle path rather than the build-or-buy binary.
On this page
This is the first move on the volume’s capability axis. The spine framed every capability — a tool, an abstraction, a framework — as a context cost paid before it is used; the build-vs-buy decision is the first place that bites. The thesis is a default: start direct on the API and let the harness earn itself. The middle path between “build it all” and “buy a framework” is configure, wrap, extend — and the rule that organizes the whole decision is sequenced, not one-shot: simplest thing that works first, abstraction added only when a concrete need demands it.
Start direct, start simple
Begin with the recommendation and let it organize everything else. Anthropic’s agent-building guidance states the default flatly: developers should “start by using LLM APIs directly,” because “many patterns can be implemented in a few lines of code.” [Official] Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original This is not a beginner’s on-ramp you graduate out of — it is the recommended starting point for production systems, and the evidence behind it is an aggregated cross-team observation: across the many teams Anthropic worked with, “the most successful implementations weren’t using complex frameworks or specialized libraries.” [Official] Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original
So the decision does not open with “which framework?” It opens with “do I need one at all?” — and the default answer is no.
This is the same instinct the spine drew as the capability axis, applied to the harness layer instead of the tool set. A framework is a capability you expose to yourself — and like every capability, it is paid for before it is used, in the abstraction it inserts between you and the model. The default is therefore subtraction here too: the smallest harness that covers the workflow beats a feature-complete one.
The tradeoff: abstraction cost vs. convenience
A framework’s appeal is real — it gets you moving faster. The question is what that convenience costs. The cost is abstraction, and the specific damage is to visibility: frameworks can insert layers that obscure the underlying prompts and responses, “making them harder to debug.” [Official] Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original When the agent misbehaves, you are now debugging through the framework’s abstractions instead of reading the prompt and response directly.
That reframes the decision. It is not “which framework is best” — it is “is the convenience worth the lost visibility on this workflow?”
The cost is not fixed across frameworks, which is exactly why the decision is a spectrum rather than a switch. A framework can be designed to keep the prompts visible: LangGraph, for instance, describes itself as low-level and states that it “does not abstract prompts or architecture.” [Official] LangGraph overview · LangChainT2-release-notes original That is the framework-vendor’s own answer to the debuggability concern — and it reframes the axis from “framework vs. no framework” to “how much abstraction, and is it visible.” Read it as LangGraph’s stated design stance, not as a measured property or a ranking against other frameworks.
The other side: building is a standing liability
It is tempting to read “don’t reach for a framework” as “build your own harness.” But the build side has its own recurring cost, and it is easy to underprice. A harness is not a one-time artifact: “harnesses encode assumptions that go stale as models improve.” [Official] Scaling Managed Agents: Decoupling the brain from the hands · Anthropic (Lance Martin, Gabe Cemaj, Michael Cohen) (2026)T1-official original The behaviors you tune around — how the model handles a long context, when it over-eagerly calls a tool, how it formats output — shift with each model release, and the assumptions your harness baked in decay with them.
So both ends of the binary carry a cost. A framework charges abstraction cost (lost visibility); a from-scratch build charges ongoing maintenance (staleness as models move). Neither is free, which is why the realistic answer is rarely either extreme.
The framework landscape — read honestly
If you do reach for an existing harness, four options dominate the conversation. The honest way to read this landscape is one framework at a time, against its own documentation — because each framework’s description of itself is authoritative on what it provides and not on how it ranks against the others. There is no independent cross-vendor benchmark here; each line below is a vendor self-description.
- LangGraph describes itself as “a low-level orchestration framework and runtime for building, managing, and deploying long-running, stateful agents.” [Official] LangGraph overview · LangChainT2-release-notes original Its documented capability set centers on durable execution, streaming, human-in-the-loop, and persistence.
- CrewAI describes itself as “the leading open-source framework for orchestrating autonomous AI agents and building complex workflows,” organized around role-based crews. [Official] Introduction (What is CrewAI?) · CrewAI (documentation)T2-release-notes original The word “leading” is CrewAI’s own framing, not an independent ranking.
- The Claude Agent SDK “gives you the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.” [Official] Agent SDK overview · AnthropicT1-official original Its documented capabilities include built-in tools, hooks, subagents, MCP, permissions, and sessions.
- The OpenAI Agents SDK describes itself as “a lightweight, easy-to-use package with very few abstractions,” built around a small primitive set. [Official] OpenAI Agents SDK · OpenAI (Agents SDK documentation)T2-release-notes original “Very few abstractions” is the vendor’s own positioning.
Because these are framework docs, they move fast — each ships per release, and the self-descriptions drift. Treat the four lines above as a snapshot, not a constant.
The middle path: configure, wrap, extend
Put the two costs together and the binary dissolves. You rarely face a clean choice between writing everything yourself and surrendering to a framework. The realistic answer is in the middle: start thin on the direct API, and where you do adopt, adopt something configurable that you assemble from primitives.
The Claude Agent SDK is the canonical instance of this stance. It is “the agent harness that powers Claude Code,” [Official] Building agents with the Claude Agent SDK · Thariq Shihipar et al. (2025)T1-official original so adopting it means taking a production-proven harness rather than writing one — and at its core “the SDK gives you the primitives to build agents” [Official] Building agents with the Claude Agent SDK · Thariq Shihipar et al. (2025)T1-official original for whatever workflow you are automating, composed and extended through subagents and hooks. That is the configure/wrap/extend position made concrete: a configurable harness you take, shape, and extend, rather than a from-scratch build you own end-to-end or an opaque framework you cannot see into.
This is where the abstraction-cost axis pays off: the SDK route lets you adopt without going opaque, because you are assembling from primitives rather than inheriting a black box. The custom-build route’s maintenance liability is also softened, because the harness’s general plumbing is maintained by its vendor against new models — you only own the thin extension layer your workflow actually needs.
The sequenced rule
Compose the criteria with the default and the middle path and the whole decision reduces to a sequence — not a one-shot choice, but an order you move through only as far as a real need pulls you.
- Start without a framework. Direct API, thin harness, because many patterns are a few lines of code. [Official] Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original
- Adopt a configurable harness when a concrete need earns the abstraction — preferring one that keeps prompts visible [Official] LangGraph overview · LangChainT2-release-notes original and that you do not have to re-tune against every model release. [Official] Scaling Managed Agents: Decoupling the brain from the hands · Anthropic (Lance Martin, Gabe Cemaj, Michael Cohen) (2026)T1-official original The Claude Agent SDK is the canonical configure/wrap/extend option. [Official] Building agents with the Claude Agent SDK · Thariq Shihipar et al. (2025)T1-official original
- Build a fully custom harness only when no configurable option fits — and price it as the continuous maintenance it is, because the staleness cost is now yours to carry. [Official] Scaling Managed Agents: Decoupling the brain from the hands · Anthropic (Lance Martin, Gabe Cemaj, Michael Cohen) (2026)T1-official original
A note on scope: this chapter is the decision — when to build, buy, or take the middle path. What a harness actually is (the agent loop, context management, the tool interface as components) is the Model + Harness framing the foundation volume develops; the orchestration of more than one agent — sub-agents and multi-agent topologies — is the later, coordination half of this volume. Here the decision stops at the single-harness choice.
Patterns
Start direct on the API. Sketch: implement the workflow with direct LLM API calls before reaching for any framework. When to use: always, as the opening move — most patterns are a few lines of code. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original Mechanics: write the loop yourself; add a tool call, a retry, a parse — measure whether a real need for abstraction appears. Remember: the default answer to “do I need a framework?” is no; the burden of proof is on the abstraction.
Weigh abstraction cost against convenience. Sketch: before adopting a framework, price the visibility you lose. When to use: any time a framework’s speed is tempting. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original Mechanics: ask how many layers stand between you and the actual prompt/response when debugging; prefer a framework that keeps prompts visible. LangGraph overview · LangChainT2-release-notes original Remember: convenience is bought with abstraction, and abstraction is paid in debuggability.
Price the build as maintenance, not a one-time cost. Sketch: treat a custom harness as a standing liability. When to use: whenever “just build it ourselves” is on the table. Scaling Managed Agents: Decoupling the brain from the hands · Anthropic (Lance Martin, Gabe Cemaj, Michael Cohen) (2026)T1-official original Mechanics: estimate re-tuning cost per model release, not just the up-front build; that recurring cost is the real comparison. Remember: harnesses encode assumptions that go stale as models improve.
Take the configure/wrap/extend middle path. Sketch: adopt a configurable, production-proven harness and extend it from primitives. When to use: when a concrete need has earned abstraction but a from-scratch build is overkill. Building agents with the Claude Agent SDK · Thariq Shihipar et al. (2025)T1-official original Mechanics: take a harness like the Claude Agent SDK; compose subagents and hooks; own only the thin extension your workflow needs. Agent SDK overview · AnthropicT1-official original Remember: you get control without the full maintenance burden, and adoption without going opaque.
Read the landscape per vendor, not as a ranking. Sketch: evaluate each framework against its own docs and your workflow. When to use: the moment you decide an existing harness is warranted. Introduction (What is CrewAI?) · CrewAI (documentation)T2-release-notes original Mechanics: match documented capability sets to your workflow’s needs; treat “leading” and “few abstractions” as self-descriptions. OpenAI Agents SDK · OpenAI (Agents SDK documentation)T2-release-notes original Remember: there is no independent cross-vendor benchmark in the vendors’ own marketing copy.
Quick reference
- Default: start direct on the API — most patterns are a few lines of code; the most successful implementations weren’t using complex frameworks. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original
- The framework cost: convenience is bought with abstraction that obscures prompts and responses, making them harder to debug. Building effective agents · Erik Schluntz and Barry Zhang (2024)T1-official original
- The build cost: a custom harness is a standing maintenance liability — its assumptions go stale as models improve. Scaling Managed Agents: Decoupling the brain from the hands · Anthropic (Lance Martin, Gabe Cemaj, Michael Cohen) (2026)T1-official original
- Landscape (per each vendor’s own docs, not a ranking): LangGraph = low-level orchestration framework + runtime; LangGraph overview · LangChainT2-release-notes original CrewAI = “leading” open-source role-based orchestration (self-described); Introduction (What is CrewAI?) · CrewAI (documentation)T2-release-notes original Claude Agent SDK = the tools/agent loop/context management that power Claude Code; Agent SDK overview · AnthropicT1-official original OpenAI Agents SDK = “very few abstractions” (self-described). OpenAI Agents SDK · OpenAI (Agents SDK documentation)T2-release-notes original
- Middle path: configure / wrap / extend a configurable harness — e.g. the Claude Agent SDK, built from primitives. Building agents with the Claude Agent SDK · Thariq Shihipar et al. (2025)T1-official original
- The rule: sequenced — start without a framework → adopt a configurable one when a need earns it → build from scratch only when nothing fits.
- Honesty: vendor self-descriptions are not a cross-vendor verdict; the framework lines carry a 90-day recheck.
Practice
Exercise solutions
The buy end’s cost is abstraction cost: a framework’s convenience is bought with layers that obscure the underlying prompts and responses, making failures harder to debug. The build end’s cost is ongoing maintenance: a custom harness encodes assumptions about model behavior that go stale as models improve, so it must be continuously re-tuned. The maintenance cost is the one paid continuously rather than up front — it recurs with every model release, whereas the abstraction cost is a fixed property of the framework you adopted. Because neither extreme is free — one charges lost visibility, the other charges perpetual re-tuning — the realistic answer is usually the middle: take a configurable, production-proven harness (so its general plumbing is maintained against new models for you) and extend it from primitives (so you keep visibility and own only the thin layer your workflow actually needs).
Both conclusions launder a vendor’s self-description into a cross-vendor comparison the docs never make. “Leading” is CrewAI-style marketing framing, not an independent ranking, so it licenses no claim that one framework is “more capable” than the other; “very few abstractions” is the OpenAI-style vendor’s own positioning, authoritative on what it says it provides but not a measured statement that it is more debuggable than some other framework. The legitimate way to decide is to read each framework against its own documented capability set and match those capabilities to my specific workflow — does this one’s durable-execution / role-based / primitive model fit what I actually need to build? — rather than to rank the two on adjectives. If I genuinely need a debuggability comparison, I have to source it independently (or test it myself by debugging a real failure in each), not infer it from the phrase “few abstractions.”
“It works well today” is insufficient because a harness encodes assumptions that go stale as models improve — the behaviors the team tuned around (context handling, tool-call eagerness, output formatting) shift with each model release, so a harness that fits the current model can silently misfit the next one. The maintenance cost is therefore continuous and latent: it is real even on the day everything works, because it is a liability that comes due at the next release, not a problem you can see now. At that release they should re-examine exactly the assumptions baked into the harness against the new model’s behavior — re-running their evals, checking whether their workarounds are now unnecessary or now wrong — and budget that re-tuning as a recurring cost, not a surprise. The configure/wrap/extend middle path would have changed their position by moving most of that staleness-prone plumbing onto a vendor-maintained harness (e.g. the Claude Agent SDK), so the general agent loop and context management are re-tuned against new models for them, and the team owns only the thin extension layer their workflow specifically required — shrinking the surface that goes stale to the part that is genuinely theirs.