Part I built the agent and its orchestration; Part II turns to the tools that agent reaches for. A tool is a contract between a deterministic system and a non-deterministic caller — and the architect’s leverage is not the implementation behind it but the surfaces the model actually reads: the description, the input examples, the operation boundary, the name, and the response. Get those right and a capable model selects the tool correctly; get them wrong and no amount of model quality rescues it.

Do I know this already? Diagnostic

Answer these confidently and you can skim ahead to Exam essentials; if any is shaky, read closely — each is developed below.

Which single field on a tool definition moves its performance the most, and what is the documented length floor?
What does input_examples do, and what is the one hard rule every example must satisfy?
You have create_pr, review_pr, and merge_pr. What is the documented redesign, and why?
Give the namespaced names for a GitHub search and a Jira search, and the mcp__ form an MCP tool surfaces as.
What is structurally required of every tool’s input schema — and what optional schema governs its output?

Check your answers

The description — “by far the most important factor in tool performance” — with a documented floor of at least 3–4 sentences per tool description, more if the tool is complex.
input_examples is an array of example argument objects that show the model correct calls; each example must validate against the tool’s input_schema, or the request returns a 400.
Consolidate them into a single tool with an action parameter — fewer, more capable tools reduce selection ambiguity.
github_search and jira_search; an MCP tool surfaces as mcp__<server>__<tool> (e.g. mcp__github__list_issues).
The input schema must be a JSON Schema object (a no-argument tool still declares an empty object); the optional MCP outputSchema governs the output, obligating conforming structuredContent.

The description is the highest-leverage surface

Of every field on a tool definition, the description moves performance the most: detailed descriptions are “by far the most important factor in tool performance.” [Official] Define tools · AnthropicT1-official original A description is not documentation for a human reader — it is the surface the model selects from, so it must spell out what the tool does, when it should be used (and when it should not), what each parameter means, and any caveats. [Official] Define tools · AnthropicT1-official original The guidance even sets a floor: aim for “at least 3-4 sentences per tool description, more if the tool is complex.” [Official] Define tools · AnthropicT1-official original

The gap is concrete. A get_stock_price described as “Retrieves the current stock price for a given ticker symbol… returns the latest trade price in USD… It will not provide any other information” tells the model exactly when to reach for it and what it gets back; the same tool described as “Gets the stock price for a ticker” leaves it guessing about inputs, outputs, and boundaries. [Official] Define tools · AnthropicT1-official original

Show correct usage with input_examples

The description tells the model how to use a tool; input_examples show it. This optional field carries an array of example argument objects that demonstrate correct calls — the documented “Tool Use Examples” feature. [Official] Define tools · AnthropicT1-official original A weather tool can ship three: a full call, a call with a different unit, and a call that omits the optional field — teaching the model the shape by demonstration rather than prose.

The one hard rule: each example must validate against the tool’s input_schema, or the request returns a 400. [Official] Define tools · AnthropicT1-official original Two more facts for the exam: input_examples are for client (user-defined) tools, not server-side tools, and they cost roughly 20–50 tokens for simple examples, 100–200 for complex nested ones — a context cost you pay deliberately where ambiguity is high. [Official] Define tools · AnthropicT1-official original

A description plus input_examples Worked example

A get_weather tool, with the two model-facing surfaces working together:

{
  "name": "get_weather",
  "description": "Get the current weather for a location. Use when the user asks about present conditions; not for forecasts. `unit` is optional and defaults to celsius.",
  "input_schema": {
    "type": "object",
    "properties": {
      "location": { "type": "string" },
      "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
    },
    "required": ["location"]
  },
  "input_examples": [
    { "location": "San Francisco, CA", "unit": "fahrenheit" },
    { "location": "Tokyo, Japan", "unit": "celsius" },
    { "location": "New York, NY" }
  ]
}

The third example deliberately omits unit to show it is optional. Every example validates against input_schema — if you typo "unit": "kelvin", the example fails the enum and the whole request 400s, so the examples double as a self-check on your schema. The description draws the boundary (“not for forecasts”); the examples remove any doubt about argument shape.

Consolidate operations to reduce selection ambiguity

The next surface is the operation boundary — how much each tool does. The documented default is to consolidate: “Consolidate related operations into fewer tools. Rather than creating a separate tool for every action (create_pr, review_pr, merge_pr), group them into a single tool with an action parameter. Fewer, more capable tools reduce selection ambiguity.” [Official] Define tools · AnthropicT1-official original Every extra near-equivalent tool is one more line the model can pick wrong.

The deeper principle is to design for the agent’s affordances, not mirror your API’s endpoints: rather than make the model chain list_users + list_events + create_event, give it one schedule_event; rather than get_customer_by_id + list_transactions + list_notes, give it get_customer_context. [Official] Writing tools for agents · AnthropicT1-official original A tool that returns exactly the workflow the agent needs beats three tools it must orchestrate.

Namespace tool names by service

A name is the model’s fastest disambiguator, and the documented convention is to namespace by service: “Use meaningful namespacing in tool names… prefix names with the service (e.g., github_list_prs, slack_send_message). This makes tool selection unambiguous as your library grows.” [Official] Define tools · AnthropicT1-official original Bare search becomes a liability the moment a second search exists; github_search and jira_search never collide.

Names also carry hard constraints that differ by regime. A Claude API tool name must match ^[a-zA-Z0-9_-]{1,64}$. [Official] Define tools · AnthropicT1-official original An MCP tool name should be 1–128 characters of ASCII letters, digits, underscore, hyphen, or dot — no spaces — and unique within its server. [Official] Tools — Model Context Protocol Specification 2025-11-25 · AnthropicT1-official original Those MCP tools then reach the agent through a fixed pattern, mcp__<server>__<tool>: a list_issues tool on a server keyed github becomes mcp__github__list_issues. [Official] Connect to external tools with MCP · AnthropicT1-official original

Return only high-signal information

The response is the half of the contract authors forget. The model reads every token a tool returns, so a tool should “return only high-signal information… semantic, stable identifiers (e.g., slugs or UUIDs) rather than opaque internal references, and include only the fields Claude needs to reason about its next step.” [Official] Define tools · AnthropicT1-official original Bloated responses waste the context window and bury the fields that matter. The shape of the response also shapes the next call: a semantic identifier the model can pass straight into the following tool keeps a multi-step task cheap; an opaque internal handle forces a re-lookup. [Official] Writing tools for agents · AnthropicT1-official original

When the response should be machine-shaped, MCP lets a tool declare an optional outputSchema — and when it does, the server MUST return structuredContent conforming to that schema (mirroring it in a text block for compatibility). [Official] Tools — Model Context Protocol Specification 2025-11-25 · AnthropicT1-official original That is the output-side analogue of the required input schema; the structured-output machinery that drives it is Domain 4’s subject (D4.3).

The structural floor: an object input schema

Beneath the design judgments sits a requirement no interface can skip. Every tool’s input schema is a JSON Schema object: in the Claude API a tool definition’s three required fields are name, description, and an input_schema object; [Official] Define tools · AnthropicT1-official original in MCP the inputSchema is required and must be a valid JSON Schema object, not null. [Official] Tools — Model Context Protocol Specification 2025-11-25 · AnthropicT1-official original A tool that takes no arguments still declares an empty object schema — the object is the floor every interface stands on.

Practice

Exercise solutions

Solution ↑ Exercise

Consolidate the three into one get_customer_context tool (namespace it — e.g. crm_get_customer_context — if the agent spans services). Its description should state what it returns and when to use it: “Returns a customer’s profile, recent transactions, and notes for a given customer ID; use it whenever you need context about a customer before acting.” The redesign applies consolidation (fewer, more capable tools reduce selection ambiguity) and design-for-affordances (one call returns the context the agent needs instead of three CRUD calls it must chain). The agent stalled because three thin tools forced multi-step chaining the descriptions never made obvious; a single high-signal response also lets any follow-up call reuse the returned identifiers cheaply.

Solution ↑ Exercise

The most likely cause is that one of the examples does not validate against the tool’s input_schema — an invalid input_examples entry returns a 400. Every example must conform to the same input_schema the real calls do (right types, required fields present, enum values legal); a single bad example (a typo’d enum, a missing required field) fails the whole request. The examples have to agree with input_schema — which is also why they double as a check on the schema itself.

Solution ↑ Exercise

A good description must add, at minimum: (1) what the tool does concretely (not “gets data” but which data, in what form); (2) when to use it and when not to — the boundary that prevents misrouting; (3) what each parameter means (and what the response returns). Aim for 3–4 sentences. The audience is the model, which selects tools by description alone and never reads the implementation — so an opaque description is a performance bug the model cannot route around, making the description the single highest-leverage fix (“by far the most important factor in tool performance”). Adding input_examples compounds the gain by showing correct argument shape.

Exam essentials

The description is the highest-leverage surface — “by far the most important factor in tool performance.” Say what the tool does, when (and when not) to use it, and what each parameter means; 3–4 sentences minimum.
input_examples show correct usage — an array of example argument objects; each must validate against input_schema (invalid → 400). Client tools only, not server tools; ~20–50 / ~100–200 tokens.
Consolidate to reduce selection ambiguity — fewer, more capable tools (an action parameter over create_pr/review_pr/merge_pr); design for the agent’s affordances, not your API’s endpoints.
Namespace names by service — github_list_prs, not a bare search. API names match ^[a-zA-Z0-9_-]{1,64}$; MCP names are 1–128 ASCII chars and surface as mcp__server__tool.
Return only high-signal information — semantic, stable identifiers and only the fields the model needs. MCP’s optional outputSchema governs the machine-shaped output (server must then return conforming structuredContent).
The input schema must be an object — the structural floor of every tool; strict: true (D2.2/D2.3) then makes inputs conform to it.