Glossary

70 terms.

.claude/rules/ (rules)

.claude/rules/ is a modular instruction system of markdown files loaded into context every session with recursive subdirectory discovery — a system parallel to CLAUDE.md, not a subsystem of it: rules without paths frontmatter load at launch at the same priority as .claude/CLAUDE.md, neither nested under it nor overridden by it. Rules come in two scopes with a documented load order: user-level rules live in ~/.claude/rules/ and apply to every project on your machine (for machine-wide personal preferences), while project rules live in the repo’s .claude/rules/; user-level rules are loaded before project rules, giving project rules higher priority when two instructions tension — the same broad-to-specific recency model as the CLAUDE.md hierarchy. The lever that makes rules more than “another CLAUDE.md” is the optional paths glob, which path-scopes a rule so it loads only when Claude reads matching files. Symlinks work in the directory (circular ones are detected gracefully), so shared rules can be linked from a central location.

.mcp.json (mcp.json)

.mcp.json is the file-based way to configure an MCP server — a declaration at the project root that is committed to version control and read by both the CLI and any SDK run whose settingSources includes "project" (it is the alternative to registering a server programmatically via mcp_servers / mcpServers). Because it is committed and shared, secrets must never be written into it literally; they are referenced through env-var expansion instead — ${VAR} (expands, or fails the parse if unset) and ${VAR:-default} (expands, or uses the default) — which works inside command, args, env, url, and headers, so a config can carry "Authorization": "Bearer ${API_KEY}" while the key lives only in the environment. As a config location it corresponds to Project scope, the shared, approval-prompted tier (versus the home-directory ~/.claude.json used by Local and User scope). It is also where transports unavailable to the --transport flag, notably ws (WebSocket), can be declared, alongside claude mcp add-json.

@import

@import is the syntax (@path/to/import) by which a CLAUDE.md pulls in other files, stitching a modular instruction set together: the imported files expand and load at launch alongside the referencing file, and relative paths resolve relative to the file containing the import. The import chain has a documented maximum recursion depth of 5. The first time a session encounters an import, Claude Code shows an approval dialog, and declining it disables imports permanently — the dialog does not reappear, so a future import in that environment will silently not expand until the choice is reset. The mechanism is also the cross-tool bridge for AGENTS.md: because Claude Code reads only CLAUDE.md and never loads AGENTS.md on its own, a one-line @AGENTS.md import lets a single instruction set serve Claude Code and other agents at once without duplicating its contents.

adaptive decomposition (adaptive orchestration)

Adaptive decomposition is the structural shape of a decomposed task in which an orchestrator decides at runtime how many subtasks to spawn and what each does, scaling the decomposition to the specific input — as opposed to a sequential pipeline, whose steps are fixed in advance at design time. It is the right shape for open-ended, path-dependent work, where step N+1 depends on what step N discovered and no design-time sequence can capture the process. The canonical example is the effort-scaling ladder: simple fact-finding needs ~1 agent, a direct comparison 2–4 subagents, and complex research 10+. The capability costs roughly 3–10× the tokens of a single agent and buys thoroughness, not speed. Its failure mode is over-decomposition — “spawning 50 subagents for simple queries” — which the effort-scaling heuristic exists to guard against.

additionalProperties: false (additionalProperties false)

additionalProperties: false is the JSON-Schema keyword that forbids any object key beyond those explicitly listed, and in the structured-output and strict-tool-use subset it is mandatory on every object node, nested ones included. The constrained decoder requires it because an open object — one that allows arbitrary extra keys — has no closed set of valid continuations to compile into a grammar; at each decoding step the model must know exactly which keys are permitted, and this keyword is what closes that set. It is the single most common cause of a 400 on a hand-authored schema, precisely because standard JSON Schema defaults additionalProperties to true, so a schema that “validates fine in your editor” still fails grammar compilation at the API. Its required presence is also why open-ended extraction (where the set of fields is genuinely unknown, i.e. additionalProperties: true) cannot use structured output and must fall back to the classic tool-use pattern, which imposes no such closure.

Agent loop (agentic loop, tool-use loop)

An agent is an ordinary language model placed in a loop: it proposes a tool call, your code executes it, the result is fed back as the model’s next input, and the model decides again — “act, observe, decide” repeated until the model answers without calling a tool. The loop’s single branch condition is stop_reason: tool_use continues it (your code returns tool_result blocks on the next turn), end_turn ends it. The model decides what happens next, but the surrounding code decides whether it gets to — so owning the loop, including its termination budget (max_turns / max_budget_usd), is the core orchestration discipline, not authoring any single tool.

See also: Subagent, Agent loop

Context rot

Context rot is the umbrella term for the quiet decline in a long context’s quality that sets in before the window ever overflows. It groups the degradation mechanisms that erode a long conversation — lost-in-the-middle (material buried mid-context is attended to least reliably) and summarization loss (progressive summarizing of earlier turns discards detail that may later matter) — under the single principle that a long context gets worse before it gets full. The exam-relevant skill is diagnostic: given a misbehaving long session, name whether it is accumulation pressure, lost-in-the-middle, or post-compaction loss, because each has a distinct remedy and reaching for a bigger window addresses none of them — a larger context still loses its middle and still compacts eventually. The depth — the research and measurement behind these mechanics — lives in the Agentic Systems Design book; here the job is to recognize them and treat the onset of degradation, not the overflow error, as the thing to design against.

Context window (context budget)

The context window is the finite, cumulative token budget a session draws from, and everything shares it: the system prompt, tool definitions, CLAUDE.md, the full conversation history, and every tool input and output all accumulate in one pool that never refills within a session. Current capacities are concrete — 1M tokens on Opus 4.8 and Sonnet 4.6, 200k on Haiku 4.5 — though tokenizer density varies, so the same text can cost up to ~35% more of the budget on one model than another. The key reliability distinction is that “fits in the window” and “well-attended” are different claims: the token limit is a capacity bound, while attention is a quality that declines as the window fills, so a conversation comfortably under the limit can still have lost the thread of an instruction given fifty turns ago. On a large codebase the goal is therefore never to fit everything in but to load the task’s slice and keep the rest out — what you decline to read is as much a design decision as what you read.

coordinator–subagent pattern (orchestrator-worker, hub-and-spoke)

The coordinator–subagent pattern (also orchestrator-worker, or hub-and-spoke) is the canonical multi-agent shape: a lead agent decomposes a task, dispatches subagents that each run in their own context window and explore parts of it independently, and then synthesizes their returned results. The coordinator owns planning and synthesis; the subagents own focused execution. The motivation is not “more brains” but more windows — a single agent has one finite context window, and extra agents relieve that bottleneck. It is expensive, typically using 3–10× more tokens than a single agent, so reach for it only when one of three conditions holds: context protection (large, mostly-irrelevant intermediate data should stay out of the main window), parallelization (genuinely independent paths), or specialization (tool-set overload or deep domain expertise). Delegation is one level deep: subagents cannot spawn subagents.

custom_id (custom id)

custom_id is the unique identifier attached to each request in a Message Batch, and it is the only sanctioned key for joining a result back to the request that produced it. It is mandatory rather than optional because batch results “can be returned in any order” and may not match the order of submission — a batch is a set, not a sequence, so there is no positional correspondence to fall back on. Relying on submission order is the characteristic batch failure: it silently mis-joins outputs to inputs (result n attributed to request n when it answers some other request), corrupting data with nothing in the response to flag it. The id must be unique across the batch — reusing one makes two results indistinguishable — and the documented format is 1-64 characters of alphanumerics plus - and _. Treat it as a primary key: unique, meaningful, and the sole correct way to match results to requests.

description (tool) (tool description)

A tool’s description is the natural-language field on its definition that tells the model what the tool does, when to use it (and when not to), what each parameter means, and any caveats — and it is “by far the most important factor in tool performance,” the single highest-leverage surface an architect controls. The model never reads your implementation; it selects tools by their descriptions alone, so the description is the API as far as the agent is concerned. The documented floor is at least 3–4 sentences per description, more if the tool is complex: a get_stock_price that spells out its inputs, its USD return value, and that “it will not provide any other information” routes correctly, whereas “Gets the stock price for a ticker” leaves the model guessing about inputs, outputs, and boundaries. A vague description is a performance bug the model cannot route around, which is why it earns the first and largest share of design effort.

Error propagation

Error propagation is the way a fault in a chain of agents does not stay local: an upstream ambiguity becomes a downstream wrong decision, and concurrent faults compound into a degradation no single component test reproduces. A chain’s reliability is the product of its handoffs, not its best agent’s reliability — each boundary is both a place an error can enter and a place an existing error passes through unexamined — so adding agents multiplies the surfaces where intent can be dropped. The propagation mechanism is specific: a mid-pipeline agent cannot pause to ask, so it resolves an ambiguity and hands the guess downstream as settled fact, and the next agent has no signal that its input was a guess. Compounding failures are especially hard to catch because they live between components — each part passes its own eval, and the breakdown appears only in the interaction, on traffic slices no single test exercises. The boundary defenses are structured error context that crosses handoffs in machine-readable form, independent validation by an isolated judge, circuit breakers that isolate a misbehaving agent before it cascades, and keeping the escalable decision at the coordinator.

Escalate, don't guess

“Escalate, don’t guess” is the reliability principle that, when a task is genuinely ambiguous or blocked, a well-built agent surfaces the decision to the party who can make it rather than silently picking an interpretation. A silently-resolved ambiguity is a coin flip on intent that nobody chose to take; escalation converts that flip into a decision made by the one party who actually knows the answer. The economics are lopsided — the cost of asking is one round trip, while the cost of guessing wrong is the whole task built on the wrong branch — and the cost only grows the longer an ambiguity survives, so the strongest form is proactive (front-loading clarifying questions before any work depends on the answers). The principle has sharp teeth in multi-agent pipelines: a mid-pipeline agent usually has no one to ask, so where an interactive agent would pause and clarify, it instead resolves the ambiguity itself and hands the guess downstream as settled fact — which is why open questions should be resolved at the coordinator before delegating a fully-specified task.

Escalation ladder (output-control ladder)

The escalation ladder is the three-rung hierarchy for controlling a model’s output shape, climbed only as far as the stakes require because each rung costs more in context, latency, and setup. Rung 1 is explicit instruction — name the format and the criteria in the prompt; the cheapest rung, and it handles the common case. Rung 2 is few-shot examples — demonstrate the desired handling on the ambiguous inputs a written rule cannot fully pin down. Rung 3 is structured outputs or strict tool use — constrain decoding so a non-conforming shape cannot be emitted at all; the strongest guarantee and the highest setup cost. The documented discipline is to ask plainly first, since newer models can reliably match complex schemas when simply told to, and to escalate a given field only when a stronger guarantee is genuinely needed — most fields never leave rung 1, and only a crash-on-violation field earns rung 3, where an out-of-set value becomes unrepresentable rather than merely discouraged.

Example tags (<example> tags)

Example tags are the XML-style wrapper — a single demonstration in <example> tags, the whole set grouped in <examples> tags — that marks few-shot examples as examples so the model can distinguish them from the instructions and from the live input. “Structured” is one of the three documented example-quality criteria (alongside relevant and diverse) precisely because this delimiting matters: without it, a demonstration can blur into the instruction text and the model may read a sample input as a directive. Inside the set, each pair couples an <input> with its desired <output>, and the conventional move is to place the ambiguous edge case in the middle of the set with the handling you want — for instance, an input with a missing field whose output is null, teaching that “no value” resolves to null rather than an empty string or "unknown". The tags are the syntactic illustration of few-shot prompting; the construction and quality criteria are the stable substance.

Explicit criteria (explicit instruction)

Explicit criteria is the principle that output quality is controlled by the specification, not the model: name the success criteria and the output shape — fields, types, lengths, missing-data handling — directly in the prompt rather than leaving them to inference. If two runs of the same prompt disagree, the disagreement was latent in the prompt, a degree of freedom you left unstated that the model resolved differently each time; pinning every degree of freedom you care about stops the drift. Modern models follow instructions more literally and do not infer requests you did not make, so a requirement held only in your head is simply unmet — which makes this a durable, stable principle that newer models make more load-bearing, not less. A corollary is that positive instruction (“respond in flowing prose”) steers more reliably than a prohibition (“don’t use lists”), because a positive instruction aims at the target while a negative one only rules out one failure inside a still-vast permitted region. It is the cheapest and first rung of the output-control escalation ladder.

explore-plan-implement-commit (four-phase workflow)

Explore-plan-implement-commit is the recommended four-phase rhythm for driving an agentic task: Explore (read files in plan mode to build understanding) → Plan (create a detailed implementation plan) → Implement (switch out of plan mode and verify against the plan) → Commit (write a descriptive message and PR). Explore and Plan are the read-only front half — exactly plan mode’s territory — and Implement and Commit are where edits land. The loop’s whole purpose is to separate understanding from editing: a shared model of the change is built before a single line moves, and the implement phase then checks its work against that plan; collapse the two and Claude optimizes a problem it never confirmed it understood, producing code that solves the wrong problem. You may skip the plan phase only when the diff is one-sentence-describable — a change small and clear enough that there is nothing for a plan to de-risk. The rhythm is a durable discipline rather than a feature, surviving any tool rename or keybinding change.

Few-shot prompting (multishot prompting)

Few-shot prompting (also called multishot) steers a model by giving it a small set of worked input→output examples rather than only a written instruction; it is one of the most reliable ways to fix format, tone, and structure, and the only clean way to pin down an ambiguous case. The model does not memorize the examples — it extracts the implicit pattern across them and applies it to the new input, so a demonstration carries information a sentence struggles to (exact field ordering, how a borderline input should resolve). The documented sweet spot is 3-5 examples: with 1-2 the model latches onto an incidental trait instead of the intended pattern, and at 6+ you burn context and risk two examples disagreeing and teaching “either is acceptable.” Examples should be relevant (mirror the real use case), diverse (vary enough to avoid teaching a spurious shared trait — the most-neglected, most-consequential criterion), and structured (wrapped in example tags so the model separates them from instructions). The highest-value move is to place an example on the edge case showing the desired handling; few-shot composes with structured output rather than competing with it, since the schema locks the shape while examples teach the content and edge-case handling.

headless mode (print mode, -p)

Headless mode is Claude Code’s non-interactive invocation — claude -p "<query>" — which runs the full agent loop and exits after responding, with no prompt or session UI; all standard CLI options work with -p. It is the entry point for running Claude Code in CI, where the mechanics change because there is no human at the keyboard to approve a tool or answer a question, so the run must settle its output shape and permission surface up front. For reproducibility you pair it with --bare, which skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md — without it the run loads whatever the host machine has, so the same command behaves differently on different runners (--bare is slated to become the -p default). Output is selected with --output-format: text (default), json (a payload with result, session_id, and total_cost_usd), or stream-json (newline-delimited events); --json-schema adds a validated structured_output field and is print-mode only, as are --max-turns and --max-budget-usd. The permission floor is --permission-mode dontAsk (denies anything not in permissions.allow or the read-only set) paired with --allowedTools, and the pipeline gates on the run’s process exit code — 0 passes, non-zero (e.g. hitting --max-turns, or over-cap stdin above 10 MB) fails.

hook precedence (hook decision precedence)

Hook precedence is the fixed rule that decides the outcome when several hooks (or permission rules) act on one event: deny > defer > ask > allow. All matching hooks run in parallel, and the most restrictive result wins — so if any hook returns deny, the operation is blocked regardless of what the others return, and to permit a call every hook must agree. The system fails safe: one hook saying “no” is enough to stop something. defer sits just under deny because it is the special decision that ends the query so the host can resume it later from the persisted session — a pause-and-hand-back that is more restrictive than asking or allowing. Because completion order is non-deterministic, a hook must never assume another has already run; each should act independently.

See also: turn, Agent loop, session

MCP scopes (MCP scope)

MCP scopes are the three tiers Claude Code stores MCP servers in, each a different file with a different audience: Local (~/.claude.json, under a per-project key — current project only, not shared), Project (.mcp.json at the repo root — shared via version control, with a one-time approval prompt on first use), and User (~/.claude.json — available across all your projects). The CLI flag claude mcp add <name> --scope <local|project|user> picks the scope; omit it and the default is Local. When the same server name appears in more than one scope, Claude Code connects once using the highest-precedence source — Local → Project → User → plugin-provided → claude.ai connectors (the first three match duplicates by name) — which is the intended override path for personal credentials, not a conflict. The right first question is never “which file?” but “who should see this server?”: credential-bearing or experimental servers go Local, team-shared servers go Project, cross-project personal servers go User. Note the notorious collision — MCP “local scope” lives in ~/.claude.json, not in the project’s .claude/settings.local.json general local-settings file.

MCP transports (MCP transport)

An MCP transport is the channel a server communicates over, selected by its type: stdio for local processes, sse (Server-Sent Events, now deprecated — use HTTP), and http (Streamable HTTP, with streamable-http accepted as an alias for http in JSON configs). A fourth type, ws (WebSocket), is configurable only through .mcp.json or claude mcp add-json, not via the --transport flag — whose accepted values are just http / stdio / sse. Distinct from these, and not a .mcp.json type, the SDK can run an MCP server in-process inside your application as a deployment mode (for example a built-in tool server), rather than as an external process or endpoint. Beneath the transport sits the MCP wire protocol, which is mid-revision and should be cited with a date: the 2025-11-25 spec requires an initialize handshake as the first interaction, while the 2026-07-28 release candidate removes that handshake for a stateless model — so verify the wire details against the current spec, even though the configuration surface (scopes, files, env vars) is the more stable part.

Message Batches API (batch processing)

The Message Batches API asynchronously processes large volumes of Messages requests for a flat 50% discount on both input and output, the trade being latency: most batches finish under an hour, but the SLA is 24 hours, after which an incomplete batch expires (results stay retrievable for 29 days). The decision rule is purely latency tolerance — if a human or synchronous system is blocked on the answer, batch is wrong; if the work is an overnight job, a backfill, or an offline evaluation, batch halves the bill, and the discount stacks with prompt caching. A single batch is bounded by 100,000 requests or 256 MB, whichever comes first (an oversized payload returns HTTP 413); streaming is unsupported, and each request is single-shot with no multi-turn tool round-trip, though structured outputs compose cleanly. Its one non-negotiable contract is custom_id matching, because results return in any order. Billing is only for succeeded results (errored, canceled, expired are free), and a crucial subtlety is that succeeded is a batch-level outcome meaning the request ran — the per-message stop_reason must still be checked, since a billed refusal or a truncation arrives as succeeded.

Model Context Protocol (MCP) (MCP)

The Model Context Protocol (MCP) is the open protocol by which an agent connects to an external server that supplies tools (and other capabilities); a connected server’s tools surface to the model under the namespaced form mcp__<server>__<tool>. A server is configured either programmatically (mcp_servers / mcpServers, optionally locked down with strictMcpConfig: true) or via a committed .mcp.json at the project root, and installed with claude mcp add <name> --scope <local|project|user> --transport <http|stdio|sse>. When the same server name appears in several scopes, Claude connects once using the highest-precedence source (Local → Project → User → plugin → connector). A server that never connected is a silent failure mode — confirm status: connected in the system:init message before relying on its tools.

Multi-pass review (independent reviewer pattern)

Multi-pass review is the practice of checking work with a fresh, independent context rather than asking a model to review its own output in the same session — the weakest review, for two independent reasons. First, attention dilution: performance degrades as the context window fills, so a session that already holds the implementation reviews from a dilated state. Second, implementer bias: a model that just wrote the code is biased toward defending it, while a reviewer with no authorship has nothing to rationalize. The pattern scales across three levels: a verification subagent (a child in its own context window, cheapest, single session), a Writer/Reviewer pair (two genuinely independent sessions, the quality-critical default), and a fleet (many parallel specialists, each owning one issue class — the direct architectural answer to attention dilution). A fleet’s essential safeguard is a verification pass that re-checks each candidate finding against actual code behavior to filter false positives; without it, parallel reviewers’ plausible-but-wrong findings accumulate, so more reviewers means more noise, not more signal. Convergence rules (a short instruction block, damped re-review) keep “more passes” from becoming spam, and are also cost control.

opusplan

opusplan is a Claude Code model alias that pairs plan mode with a model-per-phase split: it uses opus in plan mode for complex reasoning and architecture decisions, then automatically switches to sonnet for execution (code generation and implementation). The switch fires at the plan→execute boundary — the moment you approve a plan, one action flips two things at once: the permission mode leaves plan for the chosen write mode, and the model leaves its Opus plan phase for Sonnet. The intent is to spend the expensive tokens where the leverage is, on the design, and the cheaper, faster model on the mechanical edits. Set it like any alias — /model opusplan during a session or claude --model opusplan at startup. One trap worth memorizing: the automatic 1M-context upgrade applies to the opus alias only, not opusplan, so opusplan’s plan phase runs at the standard 200K window — if a planning step genuinely needs more than 200K of context at once, reach for opus[1m] (or pin a 1M model) for that phase instead.

paths scoping (path-scoped rules)

Paths scoping is the use of the optional paths frontmatter — a list of glob patterns — to make a .claude/rules/ rule (or a skill) load conditionally rather than unconditionally. A path-scoped rule triggers when Claude reads a file matching the pattern, not at launch and not on every tool use: a rule scoped to src/api/**/*.ts costs nothing in context until Claude actually reads an API file, and then applies while that work is in scope — so work that never touches src/api/ never pays for it. The glob format is shared between rules and skills: **/*.ts (all TypeScript files at any depth), src/**/* (everything under src/), *.md (Markdown in the project root only), and brace expansion such as **/*.{ts,tsx} (multiple extensions in one pattern). The discipline it enables: reach for a path-scoped rule when guidance is real but only relevant to part of the tree, keeping file-specific instructions out of context on unrelated work — the one shape that offers this lever, since a CLAUDE.md line or an un-scoped rule would load every session.

See also: .claude/rules/, SKILL.md, CLAUDE.md

permission modes (permission mode)

A permission mode is the setting consulted when the model requests a tool, deciding whether that tool actually fires; there are six — default, acceptEdits, plan, dontAsk, bypassPermissions, and auto (TypeScript-only). Two change the tool surface directly: plan restricts the agent to read-only tools (it explores and proposes without editing source files), and acceptEdits auto-approves file edits and filesystem commands (mkdir, touch, rm, rmdir, mv, cp, sed) but only inside cwd plus additionalDirectories, prompting for paths outside that scope. The mode is only step 3 of a fixed five-step evaluation order — Hooks → Deny rules → Permission mode → Allow rules → canUseTool — and that order is the crux: deny rules and hooks sit above the mode, so a disallowedTools entry like Bash(rm -rf *) blocks even under bypassPermissions, whereas allow rules sit below it, so under bypassPermissions an allowlist is never consulted. Hence the trap: confine an agent with plan mode or a deny rule, never with allowed_tools alone, which only pre-approves and never restricts.

Plan mode

Plan mode restricts Claude to read-only research and a written proposal: it reads files and runs shell commands to explore, then writes a plan, but does not edit your source (permission prompts still apply as in default mode). Enter it by cycling Shift+Tab, prefixing a prompt with /plan, launching with --permission-mode plan, or setting permissions.defaultMode: "plan". Crucially, approving a plan exits plan mode and switches the session into the write mode each approve option names — the read-only guarantee holds only until approval. Choosing plan-first versus going direct is a risk-containment decision keyed to reversal cost and uncertainty: plan first for unfamiliar code or a wide blast radius, go direct for a small diff in code you know. The opusplan alias pairs the mode with a model-per-phase split — Opus plans, Sonnet executes.

PreToolUse hook (PreToolUse)

A PreToolUse hook is the interception mode that gates a tool call: it runs your code before the tool executes and returns a permissionDecision of allow, deny, ask, or defer, optionally with updatedInput to rewrite the call. It is the counterpart to PostToolUse, which normalizes a result after the tool runs — so a rule like “never write a .env file” must be a PreToolUse hook (at PostToolUse the write has already happened). A matcher, a regex string tested against the tool name (e.g. "Write|Edit"), selects which calls the hook sees; it does not filter by argument, so any path or command test happens inside the callback. Because subagents do not inherit the parent’s permissions, a PreToolUse hook is also the clean way to pre-approve a subagent’s tools rather than re-prompting inside every child.

Scratchpad

A scratchpad is a working file on disk that an agent writes to and reads back from, externalizing durable state out of the context window so it survives operations that touch only the conversation. Its defining property is that it outlives both context-freeing commands: compaction only summarizes the window and /clear only wipes the window, so a PLAN.md written to disk is untouched by either — the agent re-reads it after a compaction, or in a freshly-cleared session, exactly as it left it. The durable layer of a long task therefore does not live in the conversation at all. The scratchpad is one of three complementary levers for keeping the main context scoped on a large codebase — alongside compaction, which summarizes bulk away, and subagent delegation, which spends exploration cost in a separate window that returns only a summary. Each does the same job of moving bulk out of the main window; the scratchpad’s specialty is state that must persist across windows and across hosts.

Semantic error (semantic errors)

A semantic error is a response that is valid JSON matching your schema but contains incorrect data — the model named the wrong customer, copied a wrong total, or fabricated a value. It is the error class that survives structured output and strict tool use, because constrained decoding constrains form, never fact: a schema can require customer_name to be a non-empty string, but it cannot know the source said “Jane” while the model wrote “John.” Once you adopt structured outputs, schema and type errors are gone and your entire remaining error budget is semantic, so that is where validation effort must move — and since the API never sees a semantic error, you cannot retry your way out of one the model is never told about. The countermeasure is to encode the check into the schema, adding fields whose only job is verification (a stated_total re-summed against a calculated_total, a conflict_detected flag, a provenance triple whose quoted span you confirm appears in the source) so an un-checkable judgment becomes a mechanical test that application code, inside a validation-retry-feedback loop, can run.

session (agent session)

A session is the persisted conversation — the prompt and every tool call, tool result, and response — stored as JSONL on disk at ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl. The boundary that matters most is what it does not include: a session persists the conversation, not the filesystem (snapshotting and reverting the agent’s file changes is file checkpointing’s separate job). Three controls carry or branch a session: continue picks up the most recent session in the current cwd, resume picks up a specific session by ID (and recovers one that hit error_max_turns with a bumped budget), and fork starts a new session ID from a copy of the history, leaving the original untouched. Forking branches the conversation, not the disk; for cross-host work it is often more robust to capture durable artifacts as application state than to ship transcript files around.

settings precedence (settings hierarchy)

Settings precedence is the strict five-level hierarchy by which Claude Code resolves configuration values, where the highest scope wins. Named in full, highest to lowest: Managed (cannot be overridden by anything), CLI arguments (--model, --permission-mode, … — session-only), Local (.claude/settings.local.json, gitignored), Project (.claude/settings.json, committed), and User (~/.claude/settings.json, lowest). When the same setting appears in several scopes, only the highest one takes effect — so --model haiku on the CLI (level 2) beats a project opus (level 4) and a user sonnet (level 5). The two most often forgotten rungs are CLI and Local. This override model is the deliberate opposite of the CLAUDE.md instruction layer, which concatenates rather than overrides; conflating the two is the single most common configuration error. One exception: permission allow/ask/deny rules merge across scopes rather than override.

skill

A skill is a lazy-loaded, directory-bundled capability — a markdown directory .claude/skills/<name>/SKILL.md with optional supporting files (reference.md, scripts/) — that Claude can invoke on its own or that you trigger by typing /name. Unlike CLAUDE.md, which loads every session, skills load on demand: the agent receives only the skill descriptions (roughly 100 tokens each) at startup, and the full body materializes only when the skill is invoked, entering the conversation as a single message that persists for the rest of the session (it is not re-read each turn). The descriptions load into a budget defaulting to 1% of the model’s context window, and on overflow the least-invoked skills’ descriptions drop first, so a rarely-used skill can become invisible. This lazy model is what makes a skill cheap and discoverable at once, and it resolves across four scopes by precedence — enterprise > personal > project, plus plugin skills namespaced as plugin-name:skill-name so they never conflict.

See also: slash command, SKILL.md, CLAUDE.md

SKILL.md

SKILL.md is the markdown file at the root of a skill directory (.claude/skills/<name>/SKILL.md) whose frontmatter declares the skill’s behavior. Among its fields: name (the display name in skill listings, which defaults to the directory name — though it is the directory name, not this field, that sets the /command you type, except for a plugin-root SKILL.md), description (the field Claude reads at startup to decide whether to auto-invoke; description + when_to_use are capped at 1,536 characters by default), argument-hint, allowed-tools (CLI-only), model, effort, context, and paths (glob patterns that limit when the skill activates). Two switches decide who may call it: user-invocable: false hides it from the / menu so only Claude can invoke it, while disable-model-invocation: true does the inverse — only the user can trigger it via /, Claude cannot auto-invoke, and its description is kept out of context entirely (which also blocks subagent preloading). Inside the body, $ARGUMENTS expands to all passed arguments and $ARGUMENTS[N] / $N pick a specific 0-indexed one.

See also: skill, slash command, paths scoping

slash command

A slash command is a stored prompt that controls Claude Code from inside a session: it is recognized only at the start of your message, and any text that follows the command name is passed to it as arguments. Custom commands have merged into skills — a file at .claude/commands/deploy.md and a skill at .claude/skills/deploy/SKILL.md both create /deploy and work the same way, because they share one mechanism (a prompt handed to Claude). Old flat-file .claude/commands/ files keep working, but the skill form is recommended for new work because it adds directory bundling, frontmatter, and the ability for Claude to auto-invoke it when relevant. A command is thus the legacy flat-file shape of the same idea a skill expresses with a directory; everything not backed by this prompt mechanism is instead a built-in command whose behavior is coded directly into the CLI.