D4.3 closed with a hard guarantee — a schema a response cannot violate — and one caveat: a refusal still gets through. This chapter is about a deeper caveat. Constrained decoding guarantees the shape, never the truth. A perfectly schema-valid record can name the wrong customer or fabricate a total. The pattern for catching that lives above the API, in your validation loop — and the loop, not its current field names, is what this chapter is about, which makes it an architectural pattern.

Two kinds of error: schema and semantic

Structured outputs (D4.3) eliminates a whole class of failure: “Always valid: No more JSON.parse() errors. Type safe: Guaranteed field types and required fields. Reliable: No retries needed for schema violations.” [Official] Structured outputs · AnthropicT1-official original What it cannot touch is the other class — the semantic errors: responses that are valid JSON matching your schema but containing incorrect data, the very failures the SDK’s validate-and-feed-back machinery exists to catch. [Official] Get structured output from agents · AnthropicT1-official original A schema can require customer_name to be a non-empty string; it cannot know the source said “Jane” while the model wrote “John.”

The SDK retry loop handles the schema layer

For the residual schema mismatches in a multi-tool agentic run, the Agent SDK adds a retry loop: “the SDK validates the output against it, re-prompting on mismatch. If validation does not succeed within the retry limit, the result is an error instead of structured data.” [Official] Get structured output from agents · AnthropicT1-official original Crucially, exhaustion is a result you inspect, not an exception that throws — you discriminate on subtype: success carries the typed payload in message.structured_output; error_max_structured_output_retries means the budget ran out and you must fall back. [Official] Get structured output from agents · AnthropicT1-official original

Encode the semantic check into the schema

You cannot retry your way out of a semantic error the SDK never sees — so make the model commit to signals a caller can check. The pattern is to add fields whose only job is verification:

Each pattern converts an un-checkable judgment (“is this right?”) into a mechanical test (“does calculated_total equal the sum?”). [Official] Get structured output from agents · AnthropicT1-official original The model is doing the same extraction either way; you are just asking it to show enough of its work that a downstream check can catch a lie.

Close the loop: validate, feed back, retry, escalate

The full pattern stacks three independent layers, and skipping any one surfaces a different failure. [Official] Get structured output from agents · AnthropicT1-official original Layer 1 is constrained decoding (schema errors gone). Layer 2 is your application code running the semantic cross-checks above. Layer 3 re-prompts with the specific failures (“calculated_total does not equal the sum of line items; re-extract correcting this”) for a bounded number of attempts, then falls back.

The schema-design heuristics fold back into D4.3: keep schemas focused, and mark fields optional when the source might not contain them — an over-required schema turns a missing field into a retry and then an exhausted-budget error. [Official] Get structured output from agents · AnthropicT1-official original

Practice

Exercise solutions

Solution ↑ Exercise

B. A wrong-but-well-typed total is a semantic error — valid JSON, incorrect data — so no schema or type guarantee touches it. The fix is to make the error checkable: have the model emit both the document’s stated_total and its own calculated_total, then let application code compare them and route mismatches to review. A (strict) guarantees the total is a number, which it already was; it does nothing about a number being wrong. C (more tokens) addresses truncation, not arithmetic fabrication. D (minimum) is both unsupported by the structured-outputs subset and irrelevant — a bound on magnitude can’t detect a total that’s internally inconsistent with the line items.

Solution ↑ Exercise

The two subtypes are success — the run validated, and the typed payload is on message.structured_output — and error_max_structured_output_retries — validation failed within the retry budget, so there is no payload and you must fall back (simpler schema, simpler prompt, or human review). You must branch on subtype before reading the payload because exhaustion returns a result, not an exception: code that reads message.structured_output on the error path reads undefined and silently processes garbage downstream. The subtype is the success/failure contract; the payload is present and trustworthy only on success.

Solution ↑ Exercise

The three layers are (1) API constrained decoding (the schema layer — eliminates syntax/type/required/enum errors), (2) application-code semantic checks (the domain layer — cross-checks what the data means), and (3) a bounded feedback loop (re-prompts with the specific failures, then escalates on exhaustion). A fabricated customer_name — a valid string naming the wrong person — is a semantic error: layer 1 cannot catch it (the shape is perfect) and layer 2 is the one that must. The hook that makes the catch possible is a provenance field: have the model emit the source span it drew the name from (claim + source.span_quote + confidence), so application code can verify the quoted span actually appears in the document — turning “is this the right person?” into a mechanical string-containment check (D5.6).

Solution ↑ Exercise

(a) The failure is truncation, not a schema or semantic error: the response hit the max_tokens output cap partway through writing the object, confirmed by stop_reason: "max_tokens" (the output-budget value, versus end_turn). (b) The retry loop makes it worse because each re-prompt runs against the same max_tokens budget, so it truncates at the same place — every attempt fails validation identically, and the loop burns its entire allowance reaching error_max_structured_output_retries without ever being able to succeed. (c) The fix is to detect stop_reason: "max_tokens" and raise the cap (or shrink / split the schema) before retrying — retries cannot manufacture room the budget does not allow.

Exam essentials