Glossary

70 terms.

.claude/rules/ (rules)

.claude/rules/ is a modular instruction system of markdown files loaded into context every session with recursive subdirectory discovery — a system parallel to CLAUDE.md, not a subsystem of it: rules without paths frontmatter load at launch at the same priority as .claude/CLAUDE.md, neither nested under it nor overridden by it. Rules come in two scopes with a documented load order: user-level rules live in ~/.claude/rules/ and apply to every project on your machine (for machine-wide personal preferences), while project rules live in the repo’s .claude/rules/; user-level rules are loaded before project rules, giving project rules higher priority when two instructions tension — the same broad-to-specific recency model as the CLAUDE.md hierarchy. The lever that makes rules more than “another CLAUDE.md” is the optional paths glob, which path-scopes a rule so it loads only when Claude reads matching files. Symlinks work in the directory (circular ones are detected gracefully), so shared rules can be linked from a central location.

See also: CLAUDE.md, paths scoping, settings precedence

.mcp.json (mcp.json)

.mcp.json is the file-based way to configure an MCP server — a declaration at the project root that is committed to version control and read by both the CLI and any SDK run whose settingSources includes "project" (it is the alternative to registering a server programmatically via mcp_servers / mcpServers). Because it is committed and shared, secrets must never be written into it literally; they are referenced through env-var expansion instead — ${VAR} (expands, or fails the parse if unset) and ${VAR:-default} (expands, or uses the default) — which works inside command, args, env, url, and headers, so a config can carry "Authorization": "Bearer ${API_KEY}" while the key lives only in the environment. As a config location it corresponds to Project scope, the shared, approval-prompted tier (versus the home-directory ~/.claude.json used by Local and User scope). It is also where transports unavailable to the --transport flag, notably ws (WebSocket), can be declared, alongside claude mcp add-json.

See also: Model Context Protocol (MCP), MCP scopes, MCP transports

@import

@import is the syntax (@path/to/import) by which a CLAUDE.md pulls in other files, stitching a modular instruction set together: the imported files expand and load at launch alongside the referencing file, and relative paths resolve relative to the file containing the import. The import chain has a documented maximum recursion depth of 5. The first time a session encounters an import, Claude Code shows an approval dialog, and declining it disables imports permanently — the dialog does not reappear, so a future import in that environment will silently not expand until the choice is reset. The mechanism is also the cross-tool bridge for AGENTS.md: because Claude Code reads only CLAUDE.md and never loads AGENTS.md on its own, a one-line @AGENTS.md import lets a single instruction set serve Claude Code and other agents at once without duplicating its contents.

See also: CLAUDE.md, settings precedence

adaptive decomposition (adaptive orchestration)

Adaptive decomposition is the structural shape of a decomposed task in which an orchestrator decides at runtime how many subtasks to spawn and what each does, scaling the decomposition to the specific input — as opposed to a sequential pipeline, whose steps are fixed in advance at design time. It is the right shape for open-ended, path-dependent work, where step N+1 depends on what step N discovered and no design-time sequence can capture the process. The canonical example is the effort-scaling ladder: simple fact-finding needs ~1 agent, a direct comparison 2–4 subagents, and complex research 10+. The capability costs roughly 3–10× the tokens of a single agent and buys thoroughness, not speed. Its failure mode is over-decomposition — “spawning 50 subagents for simple queries” — which the effort-scaling heuristic exists to guard against.

See also: coordinator–subagent pattern, Subagent, Escalation ladder

additionalProperties: false (additionalProperties false)

additionalProperties: false is the JSON-Schema keyword that forbids any object key beyond those explicitly listed, and in the structured-output and strict-tool-use subset it is mandatory on every object node, nested ones included. The constrained decoder requires it because an open object — one that allows arbitrary extra keys — has no closed set of valid continuations to compile into a grammar; at each decoding step the model must know exactly which keys are permitted, and this keyword is what closes that set. It is the single most common cause of a 400 on a hand-authored schema, precisely because standard JSON Schema defaults additionalProperties to true, so a schema that “validates fine in your editor” still fails grammar compilation at the API. Its required presence is also why open-ended extraction (where the set of fields is genuinely unknown, i.e. additionalProperties: true) cannot use structured output and must fall back to the classic tool-use pattern, which imposes no such closure.

See also: Structured output, Strict tool use, Constrained decoding

Agent loop (agentic loop, tool-use loop)

An agent is an ordinary language model placed in a loop: it proposes a tool call, your code executes it, the result is fed back as the model’s next input, and the model decides again — “act, observe, decide” repeated until the model answers without calling a tool. The loop’s single branch condition is stop_reason: tool_use continues it (your code returns tool_result blocks on the next turn), end_turn ends it. The model decides what happens next, but the surrounding code decides whether it gets to — so owning the loop, including its termination budget (max_turns / max_budget_usd), is the core orchestration discipline, not authoring any single tool.

See also: Subagent, Context compaction

Agent tool (Task tool)

The Agent tool is the single tool through which every subagent is invoked — the one invocation surface beneath the coordinator–subagent pattern. It was renamed from Task to Agent in Claude Code v2.1.63: current SDK releases emit Agent in tool_use blocks but still report Task in the system:init tools list and in result.permission_denials[].tool_name, so code that filters on the tool name must match both values. The tool can be given something to invoke three ways — a programmatic AgentDefinition, a filesystem agent in .claude/agents/*.md, or the built-in general-purpose agent. For any of them to actually run, "Agent" must be in allowedTools, otherwise the call is never approved. Delegation through it is one level deep: a subagent’s own tools must not include Agent, because subagents cannot spawn subagents.

See also: Subagent, AgentDefinition, allowedTools

AgentDefinition (agent definition)

An AgentDefinition is the contract that describes a subagent created programmatically (passed in the agents option of a query — the recommended path for SDK apps). It has two required halves: a description that states in natural language when to use the agent, and a prompt that is the agent’s system prompt — its role and behavior. Everything else is optional refinement: tools (allowed tool names; omit to inherit all the parent’s tools), model, and maxTurns (the subagent’s own turn budget). The description does double duty, because it is also how Claude decides to invoke the agent automatically, so it must be specific and keyword-rich rather than generic — a vague description is one of the two top reasons a defined subagent never gets matched to a task.

See also: Agent tool, Subagent, allowedTools

allowedTools (allowed_tools)

allowedTools (Python allowed_tools) is the per-agent distribution knob that defines which tools an agent has at all — distinct from tool_choice, which steers a single request: one shapes the toolbox, the other shapes a call. It pre-approves the tools it lists so they run without a prompt, while the paired disallowedTools removes a tool from the request entirely so the model never sees it. For MCP access the documented guidance is to scope with a wildcard such as mcp__github__*, which “grants exactly the MCP server you want and nothing more,” rather than permissionMode: "bypassPermissions", which auto-approves MCP tools but disables every other safety prompt — far broader than necessary. The crucial caveat is that an allowlist is not a sandbox: allowed_tools only pre-approves and never restricts, so paired with bypassPermissions it approves every tool regardless (confine with plan mode or a deny rule instead). Subagents are a related gate — "Agent" must appear in allowedTools or a defined subagent never runs, since the Agent-tool call is otherwise never approved.

See also: tool_choice, permission modes, Subagent

AskUserQuestion

AskUserQuestion is the built-in tool Claude calls to surface a structured clarification when intent is ambiguous, with a deliberately bounded shape: one call carries 1–4 questions, each with a short header (≤12 characters), a multiSelect flag, and 2–4 options that each pair a label with a description. The response maps each question to its chosen label (an array or comma-joined string when multiSelect), and free-text is handled by offering an “Other” option and passing the typed text rather than the literal "Other". The bounded multiple-choice form is the point — it makes the human’s answer fast to give and unambiguous to route back into the agent’s flow. It is the tool Claude raises; the application resolves it through the canUseTool callback, which also gates ordinary tool use. A key limit for design: subagents cannot call AskUserQuestion, so a subagent that hits an ambiguous requirement can only guess or fail — open questions must be resolved by the coordinator before a fully-specified task is delegated.

See also: Escalate, don't guess, Escalation ladder, Subagent

built-in tools (built-in tool roster)

Built-in tools are the fixed roster every agent ships with on the first turn — the SDK provides roughly fourteen across six categories, identical to those that power Claude Code. The six that do the everyday work of reading and changing a codebase are Read, Write, Edit (file operations), Grep, Glob (search), and Bash (execution); beyond this set sit MCP server tools and your own custom tools. Their names are matched exactly — they appear verbatim in allowedTools / allowed_tools rules and as the tool_use.name block in messages — so allowed_tools=["read"] pre-approves nothing (the tool is Read), and a mis-cased name fails silently as if it were never listed. The roster also splits on one property that the runtime cares about: read-only tools (Read, Glob, Grep) may run concurrently while state-modifying tools (Edit, Write, Bash) run sequentially to avoid conflicts; custom tools default to sequential and opt into parallelism via readOnlyHint. Whether any given tool actually fires is then decided by the active permission mode and the allow/deny rules.

See also: permission modes, allowedTools, Model Context Protocol (MCP)

Checkable confidence signal (checkable signal)

A checkable confidence signal is a routing signal a caller can independently verify, preferred over a model’s self-reported confidence because self-reported confidence is a claim, not a measurement — a confidently-wrong output reports “high” too. The reliable signals are the ones that either match or do not, no calibration required: a cross-check mismatch (a calculated_total that disagrees with the document’s stated_total, meaning either the source is bad or a value was fabricated), a self-flagged conflict_detected (the model found contradictory source fields and is asking for adjudication), and failed provenance (a cited span that does not appear in the source document, meaning the citation was fabricated). A structured confidence field is an input too, but only after it has been empirically calibrated. These signals are the schema hooks of the validation pattern doing double duty as routing criteria — each is a place where the system can say “I am not sure” in a form a router can read — and the design rule is to route on what a caller can verify rather than letting the model’s own number gate the human queue by itself.

See also: Confidence calibration, Citations API, Knowledge cutoff

Citations API (citations)

The Citations API is the native mechanism that ties each cited claim in a response to an actual span in a source document, so the model cannot fabricate a citation to text that is not there — turning “trust me” into “check line 14.” You enable it per document with citations: {"enabled": true}, and the rule to memorize is that citations must be on all or none of a request’s documents (no mixing cited and uncited). Its real value is span-binding, not formatting: it is span-bound, not grammar-constrained, so it guarantees nothing about output shape. How a citation points at its source depends on document type — plain text by char_location, PDF by page_location, custom content (your own chunks) by content_block_location — and the returned cited_text is provided for convenience and does not count toward output tokens. A hard limit: Citations and Structured Outputs are mutually exclusive (the API returns a 400), because cited text must interleave with the response prose, which a strict JSON schema forbids; when you need both a schema and per-claim attribution, the fallback is a verified provenance triple.

See also: Knowledge cutoff, Checkable confidence signal, Structured output

CLAUDE.md (memory file)

CLAUDE.md is Claude Code’s persistent-instruction file, assembled at launch from up to four scopes loaded broadest to most specific: Managed policy, User (~/.claude/CLAUDE.md), Project (./CLAUDE.md or ./.claude/CLAUDE.md), and Local (./CLAUDE.local.md, which you gitignore). Discovery walks up the directory tree from your working directory, and a CLAUDE.md nested below cwd loads on demand — when Claude first reads a file in that subdirectory — rather than at launch. Crucially, the discovered files concatenate into context rather than overriding each other (ordered filesystem-root down to cwd, with CLAUDE.local.md appended after CLAUDE.md): there is no “winning” file, so two contradictory instructions simply sit in context at once — a smell to fix at the source, not something a closer file resolves. This is the opposite of the strict settings hierarchy, which is why reasoning about CLAUDE.md the way you reason about settings.json predicts the wrong behavior. The managed-policy file is the one scope that can never be excluded (via claudeMdExcludes), making it the instrument for org-enforced instructions.

See also: settings precedence, @import, .claude/rules/

CLAUDE.md re-injection (re-injection)

CLAUDE.md re-injection is the property that CLAUDE.md content is re-added to the context on every request, which is what makes it the right home for any rule that must hold for a whole session. The contrast is with an instruction given only in the opening prompt: when a long session nears the limit and automatic compaction summarizes the oldest turns first, a one-line rule from turn one rarely survives the summary intact, so the constraint silently stops being honored — the failure typically showing up right after a summary appears. Because re-injected content is present in the context after compaction exactly as before it, a durable rule placed in CLAUDE.md (loaded via settingSources) no longer depends on a summarizer’s discretion to survive. The discipline is simple: put session-long constraints where they are re-injected, not in turns a summary may discard. The same re-injection is also why CLAUDE.md can carry a “Summary instructions” section that steers what compaction preserves — the compactor reads it like any other context.

See also: CLAUDE.md, Context compaction, Context window

Confidence calibration (calibration)

Confidence calibration is the discipline of deciding, output by output, which results proceed automatically, which get verified, and which route to a human — an economic decision, not a quality slogan, that weighs the cost of a wrong auto-accept against the cost of a human glance (“review everything” and “trust everything” are both failures to calibrate). The word carries two senses worth keeping distinct. Routing calibration is choosing which output goes to which tier of the review funnel; measurement calibration is whether a stated confidence value actually tracks accuracy — a model is well-calibrated only if its “90% confident” outputs are right about 90% of the time, and models tend to be over-confident, reporting high certainty on answers that are wrong. Because a raw “high” cannot be trusted out of the box, you have two honest options: route on checkable signals that need no calibration, or empirically calibrate a confidence field by measuring accuracy at each stated level over real labeled data — re-measuring whenever the model, prompt, or input distribution changes, since calibration is not permanent.

See also: Checkable confidence signal, Multi-pass review, Escalate, don't guess

Constrained decoding (grammar-constrained sampling)

Constrained decoding (also grammar-constrained sampling) is the technique underlying both structured output and strict tool use: a JSON Schema is compiled into a grammar, and the model’s token sampling is restricted to only those tokens the grammar permits, so a non-conforming shape cannot be emitted at all. This is what lets the modern features eliminate schema-violation retries — the shape is guaranteed rather than hoped for. The compiled grammar carries a first-request latency and is then cached (per the API, 24 hours from last use), invalidated by a change to the schema structure or tool set but not by renaming name/description fields. Its essential limit, and the reason a validation layer still sits above it: constrained decoding constrains form, never fact. Every emitted token is schema-valid, but it does not guarantee a complete or correct result — a semantic error (a schema-valid response carrying wrong data) is exactly as likely as before, and two incomplete-result failures still pass, a refusal (stop_reason: "refusal") and truncation (stop_reason: "max_tokens", where the structure ran out of token budget mid-write and cannot be fixed by retrying on the same cap).

See also: Structured output, Strict tool use, Semantic error

Context compaction (compaction)

Compaction reclaims room in a finite, cumulative context window by summarizing older turns when a long session nears the limit. It is automatic (triggered near the context limit) and steerable through three knobs: a CLAUDE.md “Summary instructions” section (which steers what content survives), the PreCompact hook, and a manual /compact. Compaction is lossy, so durable rules belong in re-injected CLAUDE.md rather than in turns the summary may discard. It is distinct from /clear: /compact condenses in place to continue the same task, while /clear resets to a fresh conversation for an unrelated one (the decision rule is continuity). A scratchpad written to disk survives both, because each only touches the conversation window.

See also: Subagent, Agent loop

Context rot

Context rot is the umbrella term for the quiet decline in a long context’s quality that sets in before the window ever overflows. It groups the degradation mechanisms that erode a long conversation — lost-in-the-middle (material buried mid-context is attended to least reliably) and summarization loss (progressive summarizing of earlier turns discards detail that may later matter) — under the single principle that a long context gets worse before it gets full. The exam-relevant skill is diagnostic: given a misbehaving long session, name whether it is accumulation pressure, lost-in-the-middle, or post-compaction loss, because each has a distinct remedy and reaching for a bigger window addresses none of them — a larger context still loses its middle and still compacts eventually. The depth — the research and measurement behind these mechanics — lives in the Agentic Systems Design book; here the job is to recognize them and treat the onset of degradation, not the overflow error, as the thing to design against.

See also: Context window, Lost in the middle, Context compaction

Context window (context budget)

The context window is the finite, cumulative token budget a session draws from, and everything shares it: the system prompt, tool definitions, CLAUDE.md, the full conversation history, and every tool input and output all accumulate in one pool that never refills within a session. Current capacities are concrete — 1M tokens on Opus 4.8 and Sonnet 4.6, 200k on Haiku 4.5 — though tokenizer density varies, so the same text can cost up to ~35% more of the budget on one model than another. The key reliability distinction is that “fits in the window” and “well-attended” are different claims: the token limit is a capacity bound, while attention is a quality that declines as the window fills, so a conversation comfortably under the limit can still have lost the thread of an instruction given fifty turns ago. On a large codebase the goal is therefore never to fit everything in but to load the task’s slice and keep the rest out — what you decline to read is as much a design decision as what you read.

See also: Context rot, Lost in the middle, Context compaction

coordinator–subagent pattern (orchestrator-worker, hub-and-spoke)

The coordinator–subagent pattern (also orchestrator-worker, or hub-and-spoke) is the canonical multi-agent shape: a lead agent decomposes a task, dispatches subagents that each run in their own context window and explore parts of it independently, and then synthesizes their returned results. The coordinator owns planning and synthesis; the subagents own focused execution. The motivation is not “more brains” but more windows — a single agent has one finite context window, and extra agents relieve that bottleneck. It is expensive, typically using 3–10× more tokens than a single agent, so reach for it only when one of three conditions holds: context protection (large, mostly-irrelevant intermediate data should stay out of the main window), parallelization (genuinely independent paths), or specialization (tool-set overload or deep domain expertise). Delegation is one level deep: subagents cannot spawn subagents.

See also: Subagent, isolated context, Agent tool

custom_id (custom id)

custom_id is the unique identifier attached to each request in a Message Batch, and it is the only sanctioned key for joining a result back to the request that produced it. It is mandatory rather than optional because batch results “can be returned in any order” and may not match the order of submission — a batch is a set, not a sequence, so there is no positional correspondence to fall back on. Relying on submission order is the characteristic batch failure: it silently mis-joins outputs to inputs (result n attributed to request n when it answers some other request), corrupting data with nothing in the response to flag it. The id must be unique across the batch — reusing one makes two results indistinguishable — and the documented format is 1-64 characters of alphanumerics plus - and _. Treat it as a primary key: unique, meaningful, and the sole correct way to match results to requests.

See also: Message Batches API, Structured output

description (tool) (tool description)

A tool’s description is the natural-language field on its definition that tells the model what the tool does, when to use it (and when not to), what each parameter means, and any caveats — and it is “by far the most important factor in tool performance,” the single highest-leverage surface an architect controls. The model never reads your implementation; it selects tools by their descriptions alone, so the description is the API as far as the agent is concerned. The documented floor is at least 3–4 sentences per description, more if the tool is complex: a get_stock_price that spells out its inputs, its USD return value, and that “it will not provide any other information” routes correctly, whereas “Gets the stock price for a ticker” leaves the model guessing about inputs, outputs, and boundaries. A vague description is a performance bug the model cannot route around, which is why it earns the first and largest share of design effort.

See also: input_schema, tool namespacing, Strict tool use

Error propagation

Error propagation is the way a fault in a chain of agents does not stay local: an upstream ambiguity becomes a downstream wrong decision, and concurrent faults compound into a degradation no single component test reproduces. A chain’s reliability is the product of its handoffs, not its best agent’s reliability — each boundary is both a place an error can enter and a place an existing error passes through unexamined — so adding agents multiplies the surfaces where intent can be dropped. The propagation mechanism is specific: a mid-pipeline agent cannot pause to ask, so it resolves an ambiguity and hands the guess downstream as settled fact, and the next agent has no signal that its input was a guess. Compounding failures are especially hard to catch because they live between components — each part passes its own eval, and the breakdown appears only in the interaction, on traffic slices no single test exercises. The boundary defenses are structured error context that crosses handoffs in machine-readable form, independent validation by an isolated judge, circuit breakers that isolate a misbehaving agent before it cascades, and keeping the escalable decision at the coordinator.

See also: MAST taxonomy, Escalate, don't guess, coordinator–subagent pattern

Escalate, don't guess

“Escalate, don’t guess” is the reliability principle that, when a task is genuinely ambiguous or blocked, a well-built agent surfaces the decision to the party who can make it rather than silently picking an interpretation. A silently-resolved ambiguity is a coin flip on intent that nobody chose to take; escalation converts that flip into a decision made by the one party who actually knows the answer. The economics are lopsided — the cost of asking is one round trip, while the cost of guessing wrong is the whole task built on the wrong branch — and the cost only grows the longer an ambiguity survives, so the strongest form is proactive (front-loading clarifying questions before any work depends on the answers). The principle has sharp teeth in multi-agent pipelines: a mid-pipeline agent usually has no one to ask, so where an interactive agent would pause and clarify, it instead resolves the ambiguity itself and hands the guess downstream as settled fact — which is why open questions should be resolved at the coordinator before delegating a fully-specified task.

See also: AskUserQuestion, Error propagation, Escalation ladder

Escalation ladder (output-control ladder)

The escalation ladder is the three-rung hierarchy for controlling a model’s output shape, climbed only as far as the stakes require because each rung costs more in context, latency, and setup. Rung 1 is explicit instruction — name the format and the criteria in the prompt; the cheapest rung, and it handles the common case. Rung 2 is few-shot examples — demonstrate the desired handling on the ambiguous inputs a written rule cannot fully pin down. Rung 3 is structured outputs or strict tool use — constrain decoding so a non-conforming shape cannot be emitted at all; the strongest guarantee and the highest setup cost. The documented discipline is to ask plainly first, since newer models can reliably match complex schemas when simply told to, and to escalate a given field only when a stronger guarantee is genuinely needed — most fields never leave rung 1, and only a crash-on-violation field earns rung 3, where an out-of-set value becomes unrepresentable rather than merely discouraged.

See also: Explicit criteria, Few-shot prompting, Structured output

Example tags (<example> tags)

Example tags are the XML-style wrapper — a single demonstration in <example> tags, the whole set grouped in <examples> tags — that marks few-shot examples as examples so the model can distinguish them from the instructions and from the live input. “Structured” is one of the three documented example-quality criteria (alongside relevant and diverse) precisely because this delimiting matters: without it, a demonstration can blur into the instruction text and the model may read a sample input as a directive. Inside the set, each pair couples an <input> with its desired <output>, and the conventional move is to place the ambiguous edge case in the middle of the set with the handling you want — for instance, an input with a missing field whose output is null, teaching that “no value” resolves to null rather than an empty string or "unknown". The tags are the syntactic illustration of few-shot prompting; the construction and quality criteria are the stable substance.

See also: Few-shot prompting, Explicit criteria

Explicit criteria (explicit instruction)

Explicit criteria is the principle that output quality is controlled by the specification, not the model: name the success criteria and the output shape — fields, types, lengths, missing-data handling — directly in the prompt rather than leaving them to inference. If two runs of the same prompt disagree, the disagreement was latent in the prompt, a degree of freedom you left unstated that the model resolved differently each time; pinning every degree of freedom you care about stops the drift. Modern models follow instructions more literally and do not infer requests you did not make, so a requirement held only in your head is simply unmet — which makes this a durable, stable principle that newer models make more load-bearing, not less. A corollary is that positive instruction (“respond in flowing prose”) steers more reliably than a prohibition (“don’t use lists”), because a positive instruction aims at the target while a negative one only rules out one failure inside a still-vast permitted region. It is the cheapest and first rung of the output-control escalation ladder.

See also: Escalation ladder, Few-shot prompting, Structured output

explore-plan-implement-commit (four-phase workflow)

Explore-plan-implement-commit is the recommended four-phase rhythm for driving an agentic task: Explore (read files in plan mode to build understanding) → Plan (create a detailed implementation plan) → Implement (switch out of plan mode and verify against the plan) → Commit (write a descriptive message and PR). Explore and Plan are the read-only front half — exactly plan mode’s territory — and Implement and Commit are where edits land. The loop’s whole purpose is to separate understanding from editing: a shared model of the change is built before a single line moves, and the implement phase then checks its work against that plan; collapse the two and Claude optimizes a problem it never confirmed it understood, producing code that solves the wrong problem. You may skip the plan phase only when the diff is one-sentence-describable — a change small and clear enough that there is nothing for a plan to de-risk. The rhythm is a durable discipline rather than a feature, surviving any tool rename or keybinding change.

See also: Plan mode, interview pattern, opusplan

Few-shot prompting (multishot prompting)

Few-shot prompting (also called multishot) steers a model by giving it a small set of worked input→output examples rather than only a written instruction; it is one of the most reliable ways to fix format, tone, and structure, and the only clean way to pin down an ambiguous case. The model does not memorize the examples — it extracts the implicit pattern across them and applies it to the new input, so a demonstration carries information a sentence struggles to (exact field ordering, how a borderline input should resolve). The documented sweet spot is 3-5 examples: with 1-2 the model latches onto an incidental trait instead of the intended pattern, and at 6+ you burn context and risk two examples disagreeing and teaching “either is acceptable.” Examples should be relevant (mirror the real use case), diverse (vary enough to avoid teaching a spurious shared trait — the most-neglected, most-consequential criterion), and structured (wrapped in example tags so the model separates them from instructions). The highest-value move is to place an example on the edge case showing the desired handling; few-shot composes with structured output rather than competing with it, since the schema locks the shape while examples teach the content and edge-case handling.

See also: Example tags, Explicit criteria, Escalation ladder

headless mode (print mode, -p)

Headless mode is Claude Code’s non-interactive invocation — claude -p "<query>" — which runs the full agent loop and exits after responding, with no prompt or session UI; all standard CLI options work with -p. It is the entry point for running Claude Code in CI, where the mechanics change because there is no human at the keyboard to approve a tool or answer a question, so the run must settle its output shape and permission surface up front. For reproducibility you pair it with --bare, which skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md — without it the run loads whatever the host machine has, so the same command behaves differently on different runners (--bare is slated to become the -p default). Output is selected with --output-format: text (default), json (a payload with result, session_id, and total_cost_usd), or stream-json (newline-delimited events); --json-schema adds a validated structured_output field and is print-mode only, as are --max-turns and --max-budget-usd. The permission floor is --permission-mode dontAsk (denies anything not in permissions.allow or the read-only set) paired with --allowedTools, and the pipeline gates on the run’s process exit code0 passes, non-zero (e.g. hitting --max-turns, or over-cap stdin above 10 MB) fails.

See also: Plan mode, allowedTools, permission modes

hook precedence (hook decision precedence)

Hook precedence is the fixed rule that decides the outcome when several hooks (or permission rules) act on one event: deny > defer > ask > allow. All matching hooks run in parallel, and the most restrictive result wins — so if any hook returns deny, the operation is blocked regardless of what the others return, and to permit a call every hook must agree. The system fails safe: one hook saying “no” is enough to stop something. defer sits just under deny because it is the special decision that ends the query so the host can resume it later from the persisted session — a pause-and-hand-back that is more restrictive than asking or allowing. Because completion order is non-deterministic, a hook must never assume another has already run; each should act independently.

See also: PreToolUse hook, Agent loop

input_schema (inputSchema)

input_schema is the JSON Schema that defines a tool’s arguments, and it is the structural floor every tool interface stands on: it must be a JSON Schema object — a tool that takes no arguments still declares an empty object schema, never null. In the Claude API a tool definition’s three required fields are name, description, and input_schema; in MCP the equivalent inputSchema is likewise required and must be a valid JSON Schema object. The schema is what the optional input_examples are checked against — every example argument object must validate against input_schema, or the whole request returns a 400, so the examples double as a self-check on the schema. Setting strict: true on the definition then makes the model’s inputs conform to that schema exactly, eliminating missing-parameter and type-mismatch errors before they reach your handler.

See also: description (tool), Strict tool use, additionalProperties: false

interview pattern

The interview pattern is a technique for starting a large feature whose design space is still unclear: rather than write a full spec you do not yet have, you invert the usual flow and let Claude drive the questions. You begin with a minimal prompt asking Claude to interview you using the AskUserQuestion tool, and Claude asks about things you might not have considered — technical implementation, UI/UX, edge cases, concerns, and tradeoffs — surfacing the hard decisions before any code is written. The interview ends by writing a complete spec to a file (e.g. SPEC.md), which is both a reviewable deliverable and the context bootstrap for what comes next. Crucially, a fresh session implements from that spec rather than the session that ran the interview: the interview session’s context is full of question-answer thrash and half-formed options, while the implementation session should start on clean context whose only input is the reviewed spec. The pattern is distinct from plan mode — plan mode makes Claude interrogate the codebase, whereas the interview makes Claude interrogate you about intent — so it fits when the unknowns live in your requirements, not in the code.

See also: Plan mode, AskUserQuestion, explore-plan-implement-commit

is_error (is_error flag)

is_error is the snake_case boolean on a Claude Messages API tool_result block that flags a failed tool call — the canonical signal that turns a failure into a message the model can reason over rather than a dead end. When set to true with actionable content (for example, ConnectionError: the weather service API is not available (HTTP 500)), Claude folds the error into its next-turn reasoning and may retry 2–3 times with corrections before apologizing; that retry is documented default behavior, not a parameter you can set, so the legibility of the error text is your only lever over it. The Messages API has exactly one tool-failure signal: a protocol-level problem (a malformed request, or tool_result not first in the content array) surfaces as an HTTP error such as 400, not a second in-band channel. The casing is the tell that distinguishes it from MCP’s isError — keep the spelling consistent with the regime you are in, and never present the MCP two-channel model as if the direct API had a JSON-RPC error class.

See also: tool_result block, isError (MCP), Strict tool use

isError (MCP) (isError)

isError is the camelCase boolean (default false) on an MCP CallToolResult that flags an execution error — and in MCP it is one of two error channels, a distinction that is normative rather than stylistic. Execution errors the model should self-correct on — input validation, API failures, business-logic violations — ride isError: true inside a successful result, addressed to the model so it reads the text and retries; protocol errors the model cannot fix — an unknown tool, a malformed request — ride a JSON-RPC error response such as -32602, addressed to the client/host. The most-tested trap lives on this line: an input-validation failure (e.g. a past departure date) belongs in the isError channel per SEP-1303, with actionable content like Invalid departure date: must be in the future, not in a JSON-RPC -32602, because routing a recoverable error to the host silently denies the model its retry. Route by who can act on the error, not by how severe it feels — and do not conflate this camelCase MCP flag with the Messages API’s snake_case is_error.

See also: is_error, Model Context Protocol (MCP), Semantic error

isolated context (isolated context window)

Isolated context is the property that defines a subagent and separates it from a plain tool call: a subagent runs in its own context window and does not see the parent’s state, where a tool call returns directly into the calling agent’s context. The intermediate tokens a subagent generates never touch the coordinator’s window — and that isolation is the feature, keeping a subtask’s noise out of the agent that must reason over the whole problem. Because results must still cross the context boundary, large outputs use the artifacts pattern: a subagent writes its full output to the filesystem or external storage and passes a lightweight reference back, rather than streaming everything through the coordinator. On the exam, the phrases “isolated context window” and “does not see parent state” are what identify a subagent.

See also: Subagent, coordinator–subagent pattern, Scratchpad

Knowledge cutoff (reliable knowledge cutoff)

A model’s knowledge cutoff is the date past which it has no dependable knowledge — the temporal half of provenance, answering not where a claim came from but when it can be trusted. The crucial subtlety is that the reliable knowledge cutoff is earlier than (or equal to) the training-data cutoff, not later: because data near the training boundary is sparse, the model’s reliable knowledge stops before its training does. Concretely, Opus 4.8 is reliable to January 2026; Sonnet 4.6 trained on data through January 2026 but is reliable only to August 2025; Haiku 4.5 trained through July 2025 but is reliable to February 2025 — so the earlier date is the one that bounds trust. The design consequence is that a time-sensitive fact about anything after the reliable cutoff must come from a dated source supplied at request time (retrieval with a citation), not from the model’s memory, where it risks a confident fabrication. This closes the provenance pair: a trustworthy claim needs both where (a source span) and when (a dated source past the cutoff).

See also: Citations API, Checkable confidence signal

Lost in the middle

Lost-in-the-middle is the degradation effect in which a model attends less reliably to material buried in the middle of a long context than to material near its start or end. It is a quality failure, not an overflow failure: a conversation can sit comfortably under the token limit, never trigger compaction, and yet have the model misremember a detail established sixty turns ago simply because that detail is now stranded in the middle of a large window. This is why “fits in the window” is not the same claim as “well-attended” — the onset of this kind of degradation, not the overflow error, is the thing to design against. The practical fix is to re-surface a critical fact near the end of the context (restate it) or to assemble the working context deliberately rather than letting it sprawl, and to discriminate this failure mode from accumulation pressure and post-compaction loss before reaching for a remedy.

See also: Context window, Context rot, Context compaction

MAST taxonomy (MAST)

The MAST taxonomy is a practitioner classification of multi-agent system failures, drawn from an analysis of over 1,600 execution traces, that sorts the breakdowns into three categories: specification problems (41.77%), the largest, then coordination failures (36.94%) and verification gaps (21.30%). It backs the finding that multi-agent LLM systems fail at rates between 41% and 86.7% in production, because specification ambiguity and unstructured coordination protocols cause agents to misinterpret roles, duplicate work, and skip verification. Specification problems propagate especially badly in a pipeline: a mid-pipeline agent cannot pause to ask, so an under-specified handoff becomes a guess it resolves and passes downstream as settled fact, seeding a wrong decision every later stage treats as valid input. The taxonomy motivates the standard boundary remedies — converting prose specs into machine-validatable schemas, enforcing typed and schema-validated messages between agents, deploying isolated judge agents for independent validation, and adding circuit breakers — one report citing a 7x accuracy improvement (10% to 70%) from structured validation loops.

See also: Error propagation, coordinator–subagent pattern, Multi-pass review

max_turns (turn budget)

max_turns is a termination budget the architect sets on the agent loop — a cap on the number of tool-use round-trips the model may run before the loop is forced to stop. Because a model-driven loop can otherwise run forever, supplying this stop condition (often paired with max_budget_usd, a client-side cost ceiling) is the architect’s safety contract: the model owns what to try next, but the architect owns whether the loop may continue. Since the final text-only response is not a turn, you size max_turns to the tool calls a task needs plus headroom — not to the messages you expect. When the loop hits the limit it ends and sets the result’s subtype to error_max_turns, on which .result is left empty; a recoverable session can then be resumed with a bumped budget rather than restarted from scratch.

See also: turn, Agent loop, session

MCP scopes (MCP scope)

MCP scopes are the three tiers Claude Code stores MCP servers in, each a different file with a different audience: Local (~/.claude.json, under a per-project key — current project only, not shared), Project (.mcp.json at the repo root — shared via version control, with a one-time approval prompt on first use), and User (~/.claude.json — available across all your projects). The CLI flag claude mcp add <name> --scope <local|project|user> picks the scope; omit it and the default is Local. When the same server name appears in more than one scope, Claude Code connects once using the highest-precedence source — Local → Project → User → plugin-provided → claude.ai connectors (the first three match duplicates by name) — which is the intended override path for personal credentials, not a conflict. The right first question is never “which file?” but “who should see this server?”: credential-bearing or experimental servers go Local, team-shared servers go Project, cross-project personal servers go User. Note the notorious collision — MCP “local scope” lives in ~/.claude.json, not in the project’s .claude/settings.local.json general local-settings file.

See also: Model Context Protocol (MCP), .mcp.json, system:init

MCP transports (MCP transport)

An MCP transport is the channel a server communicates over, selected by its type: stdio for local processes, sse (Server-Sent Events, now deprecated — use HTTP), and http (Streamable HTTP, with streamable-http accepted as an alias for http in JSON configs). A fourth type, ws (WebSocket), is configurable only through .mcp.json or claude mcp add-json, not via the --transport flag — whose accepted values are just http / stdio / sse. Distinct from these, and not a .mcp.json type, the SDK can run an MCP server in-process inside your application as a deployment mode (for example a built-in tool server), rather than as an external process or endpoint. Beneath the transport sits the MCP wire protocol, which is mid-revision and should be cited with a date: the 2025-11-25 spec requires an initialize handshake as the first interaction, while the 2026-07-28 release candidate removes that handshake for a stateless model — so verify the wire details against the current spec, even though the configuration surface (scopes, files, env vars) is the more stable part.

See also: Model Context Protocol (MCP), .mcp.json, system:init

Message Batches API (batch processing)

The Message Batches API asynchronously processes large volumes of Messages requests for a flat 50% discount on both input and output, the trade being latency: most batches finish under an hour, but the SLA is 24 hours, after which an incomplete batch expires (results stay retrievable for 29 days). The decision rule is purely latency tolerance — if a human or synchronous system is blocked on the answer, batch is wrong; if the work is an overnight job, a backfill, or an offline evaluation, batch halves the bill, and the discount stacks with prompt caching. A single batch is bounded by 100,000 requests or 256 MB, whichever comes first (an oversized payload returns HTTP 413); streaming is unsupported, and each request is single-shot with no multi-turn tool round-trip, though structured outputs compose cleanly. Its one non-negotiable contract is custom_id matching, because results return in any order. Billing is only for succeeded results (errored, canceled, expired are free), and a crucial subtlety is that succeeded is a batch-level outcome meaning the request ran — the per-message stop_reason must still be checked, since a billed refusal or a truncation arrives as succeeded.

See also: custom_id, Structured output, stop_reason

Model Context Protocol (MCP) (MCP)

The Model Context Protocol (MCP) is the open protocol by which an agent connects to an external server that supplies tools (and other capabilities); a connected server’s tools surface to the model under the namespaced form mcp__<server>__<tool>. A server is configured either programmatically (mcp_servers / mcpServers, optionally locked down with strictMcpConfig: true) or via a committed .mcp.json at the project root, and installed with claude mcp add <name> --scope <local|project|user> --transport <http|stdio|sse>. When the same server name appears in several scopes, Claude connects once using the highest-precedence source (Local → Project → User → plugin → connector). A server that never connected is a silent failure mode — confirm status: connected in the system:init message before relying on its tools.

Multi-pass review (independent reviewer pattern)

Multi-pass review is the practice of checking work with a fresh, independent context rather than asking a model to review its own output in the same session — the weakest review, for two independent reasons. First, attention dilution: performance degrades as the context window fills, so a session that already holds the implementation reviews from a dilated state. Second, implementer bias: a model that just wrote the code is biased toward defending it, while a reviewer with no authorship has nothing to rationalize. The pattern scales across three levels: a verification subagent (a child in its own context window, cheapest, single session), a Writer/Reviewer pair (two genuinely independent sessions, the quality-critical default), and a fleet (many parallel specialists, each owning one issue class — the direct architectural answer to attention dilution). A fleet’s essential safeguard is a verification pass that re-checks each candidate finding against actual code behavior to filter false positives; without it, parallel reviewers’ plausible-but-wrong findings accumulate, so more reviewers means more noise, not more signal. Convergence rules (a short instruction block, damped re-review) keep “more passes” from becoming spam, and are also cost control.

See also: verification subagent, Writer/Reviewer pattern, isolated context

opusplan

opusplan is a Claude Code model alias that pairs plan mode with a model-per-phase split: it uses opus in plan mode for complex reasoning and architecture decisions, then automatically switches to sonnet for execution (code generation and implementation). The switch fires at the plan→execute boundary — the moment you approve a plan, one action flips two things at once: the permission mode leaves plan for the chosen write mode, and the model leaves its Opus plan phase for Sonnet. The intent is to spend the expensive tokens where the leverage is, on the design, and the cheaper, faster model on the mechanical edits. Set it like any alias — /model opusplan during a session or claude --model opusplan at startup. One trap worth memorizing: the automatic 1M-context upgrade applies to the opus alias only, not opusplan, so opusplan’s plan phase runs at the standard 200K window — if a planning step genuinely needs more than 200K of context at once, reach for opus[1m] (or pin a 1M model) for that phase instead.

See also: Plan mode, explore-plan-implement-commit

paths scoping (path-scoped rules)

Paths scoping is the use of the optional paths frontmatter — a list of glob patterns — to make a .claude/rules/ rule (or a skill) load conditionally rather than unconditionally. A path-scoped rule triggers when Claude reads a file matching the pattern, not at launch and not on every tool use: a rule scoped to src/api/**/*.ts costs nothing in context until Claude actually reads an API file, and then applies while that work is in scope — so work that never touches src/api/ never pays for it. The glob format is shared between rules and skills: **/*.ts (all TypeScript files at any depth), src/**/* (everything under src/), *.md (Markdown in the project root only), and brace expansion such as **/*.{ts,tsx} (multiple extensions in one pattern). The discipline it enables: reach for a path-scoped rule when guidance is real but only relevant to part of the tree, keeping file-specific instructions out of context on unrelated work — the one shape that offers this lever, since a CLAUDE.md line or an un-scoped rule would load every session.

See also: .claude/rules/, SKILL.md, CLAUDE.md

permission modes (permission mode)

A permission mode is the setting consulted when the model requests a tool, deciding whether that tool actually fires; there are six — default, acceptEdits, plan, dontAsk, bypassPermissions, and auto (TypeScript-only). Two change the tool surface directly: plan restricts the agent to read-only tools (it explores and proposes without editing source files), and acceptEdits auto-approves file edits and filesystem commands (mkdir, touch, rm, rmdir, mv, cp, sed) but only inside cwd plus additionalDirectories, prompting for paths outside that scope. The mode is only step 3 of a fixed five-step evaluation order — Hooks → Deny rules → Permission mode → Allow rules → canUseTool — and that order is the crux: deny rules and hooks sit above the mode, so a disallowedTools entry like Bash(rm -rf *) blocks even under bypassPermissions, whereas allow rules sit below it, so under bypassPermissions an allowlist is never consulted. Hence the trap: confine an agent with plan mode or a deny rule, never with allowed_tools alone, which only pre-approves and never restricts.

See also: built-in tools, allowedTools, Plan mode

Plan mode

Plan mode restricts Claude to read-only research and a written proposal: it reads files and runs shell commands to explore, then writes a plan, but does not edit your source (permission prompts still apply as in default mode). Enter it by cycling Shift+Tab, prefixing a prompt with /plan, launching with --permission-mode plan, or setting permissions.defaultMode: "plan". Crucially, approving a plan exits plan mode and switches the session into the write mode each approve option names — the read-only guarantee holds only until approval. Choosing plan-first versus going direct is a risk-containment decision keyed to reversal cost and uncertainty: plan first for unfamiliar code or a wide blast radius, go direct for a small diff in code you know. The opusplan alias pairs the mode with a model-per-phase split — Opus plans, Sonnet executes.

PreToolUse hook (PreToolUse)

A PreToolUse hook is the interception mode that gates a tool call: it runs your code before the tool executes and returns a permissionDecision of allow, deny, ask, or defer, optionally with updatedInput to rewrite the call. It is the counterpart to PostToolUse, which normalizes a result after the tool runs — so a rule like “never write a .env file” must be a PreToolUse hook (at PostToolUse the write has already happened). A matcher, a regex string tested against the tool name (e.g. "Write|Edit"), selects which calls the hook sees; it does not filter by argument, so any path or command test happens inside the callback. Because subagents do not inherit the parent’s permissions, a PreToolUse hook is also the clean way to pre-approve a subagent’s tools rather than re-prompting inside every child.

See also: hook precedence, Agent loop, allowedTools

Scratchpad

A scratchpad is a working file on disk that an agent writes to and reads back from, externalizing durable state out of the context window so it survives operations that touch only the conversation. Its defining property is that it outlives both context-freeing commands: compaction only summarizes the window and /clear only wipes the window, so a PLAN.md written to disk is untouched by either — the agent re-reads it after a compaction, or in a freshly-cleared session, exactly as it left it. The durable layer of a long task therefore does not live in the conversation at all. The scratchpad is one of three complementary levers for keeping the main context scoped on a large codebase — alongside compaction, which summarizes bulk away, and subagent delegation, which spends exploration cost in a separate window that returns only a summary. Each does the same job of moving bulk out of the main window; the scratchpad’s specialty is state that must persist across windows and across hosts.

See also: Context compaction, Subagent, Context window

Semantic error (semantic errors)

A semantic error is a response that is valid JSON matching your schema but contains incorrect data — the model named the wrong customer, copied a wrong total, or fabricated a value. It is the error class that survives structured output and strict tool use, because constrained decoding constrains form, never fact: a schema can require customer_name to be a non-empty string, but it cannot know the source said “Jane” while the model wrote “John.” Once you adopt structured outputs, schema and type errors are gone and your entire remaining error budget is semantic, so that is where validation effort must move — and since the API never sees a semantic error, you cannot retry your way out of one the model is never told about. The countermeasure is to encode the check into the schema, adding fields whose only job is verification (a stated_total re-summed against a calculated_total, a conflict_detected flag, a provenance triple whose quoted span you confirm appears in the source) so an un-checkable judgment becomes a mechanical test that application code, inside a validation-retry-feedback loop, can run.

See also: Validation-retry-feedback loop, Constrained decoding, Structured output

session (agent session)

A session is the persisted conversation — the prompt and every tool call, tool result, and response — stored as JSONL on disk at ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl. The boundary that matters most is what it does not include: a session persists the conversation, not the filesystem (snapshotting and reverting the agent’s file changes is file checkpointing’s separate job). Three controls carry or branch a session: continue picks up the most recent session in the current cwd, resume picks up a specific session by ID (and recovers one that hit error_max_turns with a bumped budget), and fork starts a new session ID from a copy of the history, leaving the original untouched. Forking branches the conversation, not the disk; for cross-host work it is often more robust to capture durable artifacts as application state than to ship transcript files around.

See also: Scratchpad, max_turns, Context compaction

settings precedence (settings hierarchy)

Settings precedence is the strict five-level hierarchy by which Claude Code resolves configuration values, where the highest scope wins. Named in full, highest to lowest: Managed (cannot be overridden by anything), CLI arguments (--model, --permission-mode, … — session-only), Local (.claude/settings.local.json, gitignored), Project (.claude/settings.json, committed), and User (~/.claude/settings.json, lowest). When the same setting appears in several scopes, only the highest one takes effect — so --model haiku on the CLI (level 2) beats a project opus (level 4) and a user sonnet (level 5). The two most often forgotten rungs are CLI and Local. This override model is the deliberate opposite of the CLAUDE.md instruction layer, which concatenates rather than overrides; conflating the two is the single most common configuration error. One exception: permission allow/ask/deny rules merge across scopes rather than override.

See also: CLAUDE.md, permission modes, MCP scopes

skill

A skill is a lazy-loaded, directory-bundled capability — a markdown directory .claude/skills/<name>/SKILL.md with optional supporting files (reference.md, scripts/) — that Claude can invoke on its own or that you trigger by typing /name. Unlike CLAUDE.md, which loads every session, skills load on demand: the agent receives only the skill descriptions (roughly 100 tokens each) at startup, and the full body materializes only when the skill is invoked, entering the conversation as a single message that persists for the rest of the session (it is not re-read each turn). The descriptions load into a budget defaulting to 1% of the model’s context window, and on overflow the least-invoked skills’ descriptions drop first, so a rarely-used skill can become invisible. This lazy model is what makes a skill cheap and discoverable at once, and it resolves across four scopes by precedence — enterprise > personal > project, plus plugin skills namespaced as plugin-name:skill-name so they never conflict.

See also: slash command, SKILL.md, CLAUDE.md

SKILL.md

SKILL.md is the markdown file at the root of a skill directory (.claude/skills/<name>/SKILL.md) whose frontmatter declares the skill’s behavior. Among its fields: name (the display name in skill listings, which defaults to the directory name — though it is the directory name, not this field, that sets the /command you type, except for a plugin-root SKILL.md), description (the field Claude reads at startup to decide whether to auto-invoke; description + when_to_use are capped at 1,536 characters by default), argument-hint, allowed-tools (CLI-only), model, effort, context, and paths (glob patterns that limit when the skill activates). Two switches decide who may call it: user-invocable: false hides it from the / menu so only Claude can invoke it, while disable-model-invocation: true does the inverse — only the user can trigger it via /, Claude cannot auto-invoke, and its description is kept out of context entirely (which also blocks subagent preloading). Inside the body, $ARGUMENTS expands to all passed arguments and $ARGUMENTS[N] / $N pick a specific 0-indexed one.

See also: skill, slash command, paths scoping

slash command

A slash command is a stored prompt that controls Claude Code from inside a session: it is recognized only at the start of your message, and any text that follows the command name is passed to it as arguments. Custom commands have merged into skills — a file at .claude/commands/deploy.md and a skill at .claude/skills/deploy/SKILL.md both create /deploy and work the same way, because they share one mechanism (a prompt handed to Claude). Old flat-file .claude/commands/ files keep working, but the skill form is recommended for new work because it adds directory bundling, frontmatter, and the ability for Claude to auto-invoke it when relevant. A command is thus the legacy flat-file shape of the same idea a skill expresses with a directory; everything not backed by this prompt mechanism is instead a built-in command whose behavior is coded directly into the CLI.

See also: skill, SKILL.md, CLAUDE.md

stop_reason (stop reason)

stop_reason is the field on the model’s response that serves as the agent loop’s single branch condition. Its value decides whether the loop continues or ends: tool_use means the response carries tool calls, so your code executes them, returns tool_result blocks, and requests again (continue); end_turn means the model finished naturally with no tool calls, so the text is the deliverable (stop). Two other values stop the loop but need handling rather than delivery: max_tokens means the output was truncated at the token budget mid-response, and refusal means the model declined and the result is a non-answer. The full Messages API set includes further values (such as stop_sequence and pause_turn), so confirm the current list against the API reference before relying on one.

See also: Agent loop, tool_result block, turn

Strict tool use (strict: true)

Strict tool use is the feature, enabled by setting strict: true on a tool definition, that guarantees a model’s tool inputs match the tool’s JSON Schema by constraining token sampling to schema-valid outputs (grammar-constrained sampling). It closes the gap the classic tool-use pattern leaves open: without it the model might return an incompatible type ("2" where you need 2) or omit a required field, breaking the function it calls; with it those become impossible at the token level. It is the sibling of structured output — strict tool use constrains the tool_use.input (how the model calls a tool), while structured output constrains the response text (what the model says) — and both run the same grammar pipeline and compose in one request. To “call one of N candidate tools and validate its inputs,” pair tool_choice: {type: "any"} with strict: true on each tool. A sharp caveat: strict: true is honored only on the native Claude API and is silently dropped through the OpenAI SDK compatibility layer, where the request still succeeds but you get no grammar guarantee. Like structured output, it constrains form, not fact, so a schema-valid input can still be semantically wrong.

See also: Structured output, Constrained decoding, additionalProperties: false

Structured output (structured outputs, output_config.format)

Structured output is the modern feature that constrains a model’s response text to a JSON schema — carried on the request as output_config.format — using grammar-constrained decoding, so the conforming JSON arrives directly in the response with no retries needed for schema violations. It is the top rung of the output-control escalation ladder and the sibling of strict tool use: both compile your JSON Schema into a grammar and constrain token sampling, but structured output targets what the model says while strict tool use targets how it calls a tool. It accepts only a subset of JSON Schema (objects, arrays, scalars, enum, const, anyOf, internal $ref; no numeric or length bounds, external $ref, or recursion), and additionalProperties: false is mandatory on every object node. Critically, the grammar constrains form, never fact: it guarantees a schema-valid shape but cannot prevent a semantic error (valid JSON with wrong data), nor two failures that still slip past — a refusal (stop_reason: "refusal", a billed 200) and truncation (stop_reason: "max_tokens", the object cut off mid-write) — so a caller must still check stop_reason and run domain checks above the schema.

See also: Constrained decoding, Strict tool use, Semantic error

Subagent (sub-agent)

A subagent is a nested agent invoked through the Agent tool (renamed from Task in Claude Code v2.1.63 — tool-name filters must match both names) to run a scoped piece of work in its own fresh, isolated context window. It does not inherit the parent’s conversation or permissions; the only inbound channel is the Agent tool’s prompt string, so everything the subagent needs must be written into that prompt. When it finishes, the parent receives the subagent’s final message as the tool result — and may summarize it rather than carry it through verbatim. Because its exploration cost is paid in a separate window and discarded with it, delegation is a primary lever for keeping the main context scoped.

See also: Agent loop, Context compaction

system:init (system init message)

system:init is the initialization message the SDK emits before the agent runs (a system message with subtype init), and reading it is how you confirm an MCP server actually connected — the antidote to this chapter’s silent failure mode, where a wired-but-unconnected server leaves the agent running without its tools and you find out only from a confusing answer. Each entry in the message’s mcp_servers field reports a status that is one of connected | failed | needs-auth | pending | disabled; the disciplined pattern is to inspect it and refuse to run on any status other than connected, turning “the tools just aren’t there” into an explicit, actionable failure. A needs-auth means an OAuth flow hasn’t completed, while failed usually means a missing env var, an uninstalled package, a bad connection string, or an unreachable host. Remember the default 60-second initialization timeout, a common cause of failed/pending for slow-starting servers — pre-warm or use a lighter package.

See also: Model Context Protocol (MCP), MCP scopes, MCP transports

tool namespacing (service namespacing)

Tool namespacing is the convention of prefixing a tool’s name with the service it belongs to — github_list_prs, slack_send_message, jira_search — so tool selection stays unambiguous as a library grows: a bare search becomes a liability the moment a second search exists, whereas github_search and jira_search never collide. Names also carry hard constraints that differ by regime — a Claude API tool name must match ^[a-zA-Z0-9_-]{1,64}$, while an MCP tool name should be 1–128 characters of ASCII letters, digits, underscore, hyphen, or dot (no spaces) and unique within its server. MCP tools then reach the agent under a fixed namespaced form, mcp__<server>__<tool>: a list_issues tool on a server keyed github surfaces as mcp__github__list_issues. Naming is a distribution decision, not just a readability one, because the wildcard scoping of allowedTools (mcp__github__*) and the server configuration both key off these names.

See also: description (tool), Model Context Protocol (MCP), allowedTools

tool_choice (tool choice)

tool_choice is the per-request control over whether and which tool the model may call, with four documented modes forming a spectrum from free to coerced: auto (Claude decides — the default when tools are provided), any (Claude must use some tool but picks which), {"type": "tool", "name": …} (forces one specific tool), and none (no tools this turn). The most-tested constraint is that the forced modes break reasoning: only auto and none are compatible with extended (or adaptive) thinking, while any and forced tool error the request; forcing also prefills the assistant message, so a forced call emits no natural-language preamble before its tool_use block. To get a guaranteed schema-valid call without naming one tool, combine tool_choice: {"type": "any"} (guarantees a tool fires) with strict: true (guarantees the inputs match the schema) — the right shape for a classifier or extractor. Changing tool_choice between turns invalidates the cached message blocks under prompt caching (tool definitions and the system prompt stay cached), so keep it stable across cached turns.

See also: Strict tool use, allowedTools, input_schema

tool_result block (tool result)

A tool_result block is the structure your code returns to the model after executing a tool call, carrying the call’s outcome back into the loop as the model’s next input. It is sent on the next user turn with the fields {type, tool_use_id, content, is_error?}, where the tool_use_id is the join key matching each result to the tool_use block that requested it. A failed tool does not raise — it reports: you return a normal tool_result with is_error: true and the error text as content, so the model can read the failure and adapt rather than having the loop severed. When a single response contains more than one tool_use block, every corresponding tool_result must be returned together in that same next user message, each keyed by its tool_use_id; deferring one makes the next request malformed.

See also: Agent loop, stop_reason, is_error

turn (agent turn, tool-use round-trip)

A turn is one tool-use round-trip inside the agent loop: the model produces output that includes tool calls, your code (or the SDK) executes those tools, and the results feed back to the model — turns continue until the model produces output with no tool calls. The consequence that trips candidates is that a text-only final response is not a turn: a four-message session of three tool-use round-trips plus one final text answer counts as three turns, so a budget of max_turns = 2 would stop before that final step. Size a turn budget to the tool calls a task needs, not to the messages you expect to see; the free final text answer never counts against max_turns.

See also: Agent loop, max_turns, stop_reason

Validation-retry-feedback loop (validate-feed-back-retry loop)

The validation-retry-feedback loop is the three-layer pattern that catches what a schema cannot, since constrained decoding eliminates schema errors but never semantic errors (valid JSON, wrong data). Layer 1 is API constrained decoding — the schema layer, where syntax, type, required, and enum errors disappear for free. Layer 2 is application-code semantic checks — the domain layer, running the cross-checks the schema can’t, such as re-summing line items against a stated total or verifying a quoted provenance span actually appears in the source. Layer 3 is a bounded feedback loop that re-prompts with the specific failures (“calculated_total does not equal the sum; re-extract correcting this”) for a limited number of attempts, then escalates to human review on exhaustion. Each retry is a full inference (≈4× cost for three retries), so keep the schema fixed across attempts to avoid re-paying grammar-cache compilation and vary only the feedback message. The Agent SDK exposes this as a result you inspect, not an exception: branch on subtypesuccess carries the payload in message.structured_output, while error_max_structured_output_retries means the budget ran out. The one trap retries cannot fix is truncation (stop_reason: "max_tokens"): re-prompting on the same budget truncates at the same place, so detect it and raise the cap instead.

See also: Semantic error, Structured output, Constrained decoding

verification subagent (verifier subagent)

The verification subagent is the one multi-agent shape that “consistently succeeds across domains”: the main agent does the work, and a separate agent blackbox-tests the result against clear success criteria with minimal context transfer. The isolation is its strength — the verifier has no stake in, and no memory of, how the work was produced, so it cannot rationalize choices it never made. This is the same logic as the Writer/Reviewer pattern, where a fresh context improves review because it isn’t biased toward code it just wrote. Its characteristic failure mode is early victory: verifiers tend to declare success after one or two checks, and the documented mitigation is an explicit instruction such as “You MUST run the complete test suite before marking as passed.”

See also: Subagent, Writer/Reviewer pattern, isolated context

Writer/Reviewer pattern (writer-reviewer)

The Writer/Reviewer pattern is the canonical multi-step quality workflow in which one session implements and a second, fresh-context session reviews — because “a fresh context improves code review since Claude won’t be biased toward code it just wrote.” Session A writes the rate limiter; Session B reviews the file for edge cases, race conditions, and consistency; Session A then addresses the feedback. The same shape works for tests — have one Claude write the tests and another write code to pass them. Here the absence of inherited context is the feature: the reviewer cannot defend or rationalize choices it never made. This is why an agent should never review its own work, and the rule generalizes to the verification subagent, which blackbox-tests a result from its own isolated window.

See also: verification subagent, Multi-pass review, isolated context