D4.3 closed with a hard guarantee — a schema a response cannot violate — and one caveat: a refusal still gets through. This chapter is about a deeper caveat. Constrained decoding guarantees the shape, never the truth. A perfectly schema-valid record can name the wrong customer or fabricate a total. The pattern for catching that lives above the API, in your validation loop — and the loop, not its current field names, is what this chapter is about, which makes it an architectural pattern.

Do I know this already? Diagnostic

Answer these confidently and you can skim ahead to Exam essentials; if any is shaky, read closely — each is developed below.

A schema-valid record names the wrong customer. Which error class is this, and can the API catch it?
An SDK structured-output run returns. How do you tell success from exhausted retries, and why must you check before reading the payload?
To catch a total that doesn’t match the line items, what schema-level hook makes the error mechanically checkable?
Name the three layers of the validate-feed-back-retry pattern. Which layer catches a semantic error?
Your retry loop keeps exhausting on long inputs that come back cut off. Why won’t more retries help, and what will?

Check your answers

A semantic error — valid JSON, incorrect data — and the API cannot catch it: a schema constrains form, never fact.
Branch on subtype — success carries the payload in message.structured_output, error_max_structured_output_retries means fall back; exhaustion is a result, not an exception, so unchecked code silently reads undefined.
The stated_total vs calculated_total pair — the model copies the document’s total and re-sums the line items, and the caller compares the two.
API constrained decoding, application-code semantic checks, and a bounded feedback loop — the semantic error is caught by layer 2, application-code checks.
The failure is truncation (stop_reason: "max_tokens") — re-prompting on the same budget truncates at the same place; raise the cap (or shrink the schema) instead.

Two kinds of error: schema and semantic

Structured outputs (D4.3) eliminates a whole class of failure: “Always valid: No more JSON.parse() errors. Type safe: Guaranteed field types and required fields. Reliable: No retries needed for schema violations.” [Official] Structured outputs · AnthropicT1-official original What it cannot touch is the other class — the semantic errors: responses that are valid JSON matching your schema but containing incorrect data, the very failures the SDK’s validate-and-feed-back machinery exists to catch. [Official] Get structured output from agents · AnthropicT1-official original A schema can require customer_name to be a non-empty string; it cannot know the source said “Jane” while the model wrote “John.”

The SDK retry loop handles the schema layer

For the residual schema mismatches in a multi-tool agentic run, the Agent SDK adds a retry loop: “the SDK validates the output against it, re-prompting on mismatch. If validation does not succeed within the retry limit, the result is an error instead of structured data.” [Official] Get structured output from agents · AnthropicT1-official original Crucially, exhaustion is a result you inspect, not an exception that throws — you discriminate on subtype: success carries the typed payload in message.structured_output; error_max_structured_output_retries means the budget ran out and you must fall back. [Official] Get structured output from agents · AnthropicT1-official original

Encode the semantic check into the schema

You cannot retry your way out of a semantic error the SDK never sees — so make the model commit to signals a caller can check. The pattern is to add fields whose only job is verification:

Each pattern converts an un-checkable judgment (“is this right?”) into a mechanical test (“does calculated_total equal the sum?”). [Official] Get structured output from agents · AnthropicT1-official original The model is doing the same extraction either way; you are just asking it to show enough of its work that a downstream check can catch a lie.

Close the loop: validate, feed back, retry, escalate

The full pattern stacks three independent layers, and skipping any one surfaces a different failure. [Official] Get structured output from agents · AnthropicT1-official original Layer 1 is constrained decoding (schema errors gone). Layer 2 is your application code running the semantic cross-checks above. Layer 3 re-prompts with the specific failures (“calculated_total does not equal the sum of line items; re-extract correcting this”) for a bounded number of attempts, then falls back.

Unbounded, schema-thrashing, or truncation-trapped retry loops

Three ways the feedback loop bites back. First, an unbounded loop on a genuinely ambiguous task never converges — bound the attempts and escalate to human review on exhaustion, because the alternative is paying inference forever for an answer that will not come. Second, each retry is a full inference, so a three-retry run costs roughly four times a clean one — and if you mutate the schema between attempts you also invalidate the grammar cache (D4.3) and re-pay compilation; keep the schema fixed across retries and vary only the feedback message. Third, a truncation is the one failure retries cannot fix on their own: if a response stopped at the max_tokens cap — stop_reason: "max_tokens", the output-budget signal — re-prompting on the same budget truncates at the same place and silently burns the whole retry allowance. [Official] How the agent loop works · AnthropicT1-official original Detect that stop reason and raise the cap (or shrink the schema) instead of spending a single retry on it.

The schema-design heuristics fold back into D4.3: keep schemas focused, and mark fields optional when the source might not contain them — an over-required schema turns a missing field into a retry and then an exhausted-budget error. [Official] Get structured output from agents · AnthropicT1-official original

The three layers on one invoice Worked example

A nightly invoice extractor runs the full pattern, and a fabricated total shows where each layer earns its place:

Layer 1 — constrained decoding. Structured outputs returns a schema-valid object: { "stated_total": 480.00, "calculated_total": 520.00, "line_items": [...] }, with subtype: "success". No JSON.parse error is possible and both totals are guaranteed floats. The schema layer is done — and it has caught nothing wrong, because nothing is wrong with the shape.
Layer 2 — application-code semantic check. Your code re-sums line_items (520.00) and compares it to stated_total (480.00). 480 ≠ 520 → a semantic error the API could never see: both numbers are valid, but they disagree. This catch exists only because you added the stated_total / calculated_total pair to the schema — the model showed enough work for a mechanical test to run.
Layer 3 — bounded feedback. The loop re-prompts with the specific failure: “stated_total (480.00) does not match the sum of line_items (520.00); re-extract, correcting the discrepancy.” Bound it to, say, three attempts. If a retry reconciles, return success; if the budget exhausts — subtype: "error_max_structured_output_retries" or a persistent mismatch — route to human review, never silently bill the cheaper of two totals.

The discipline: layer 1 is free and total; layer 2 is where your design effort goes (the verification fields don’t exist until you add them); layer 3 must be bounded with a human backstop. Skip layer 2 and the bad total reaches billing; leave layer 3 unbounded and you pay inference forever on an answer that will not come.

Practice

Exercise solutions

Solution ↑ Exercise

B. A wrong-but-well-typed total is a semantic error — valid JSON, incorrect data — so no schema or type guarantee touches it. The fix is to make the error checkable: have the model emit both the document’s stated_total and its own calculated_total, then let application code compare them and route mismatches to review. A (strict) guarantees the total is a number, which it already was; it does nothing about a number being wrong. C (more tokens) addresses truncation, not arithmetic fabrication. D (minimum) is both unsupported by the structured-outputs subset and irrelevant — a bound on magnitude can’t detect a total that’s internally inconsistent with the line items.

Solution ↑ Exercise

The two subtypes are success — the run validated, and the typed payload is on message.structured_output — and error_max_structured_output_retries — validation failed within the retry budget, so there is no payload and you must fall back (simpler schema, simpler prompt, or human review). You must branch on subtype before reading the payload because exhaustion returns a result, not an exception: code that reads message.structured_output on the error path reads undefined and silently processes garbage downstream. The subtype is the success/failure contract; the payload is present and trustworthy only on success.

Solution ↑ Exercise

The three layers are (1) API constrained decoding (the schema layer — eliminates syntax/type/required/enum errors), (2) application-code semantic checks (the domain layer — cross-checks what the data means), and (3) a bounded feedback loop (re-prompts with the specific failures, then escalates on exhaustion). A fabricated customer_name — a valid string naming the wrong person — is a semantic error: layer 1 cannot catch it (the shape is perfect) and layer 2 is the one that must. The hook that makes the catch possible is a provenance field: have the model emit the source span it drew the name from (claim + source.span_quote + confidence), so application code can verify the quoted span actually appears in the document — turning “is this the right person?” into a mechanical string-containment check (D5.6).

Solution ↑ Exercise

(a) The failure is truncation, not a schema or semantic error: the response hit the max_tokens output cap partway through writing the object, confirmed by stop_reason: "max_tokens" (the output-budget value, versus end_turn). (b) The retry loop makes it worse because each re-prompt runs against the same max_tokens budget, so it truncates at the same place — every attempt fails validation identically, and the loop burns its entire allowance reaching error_max_structured_output_retries without ever being able to succeed. (c) The fix is to detect stop_reason: "max_tokens" and raise the cap (or shrink / split the schema) before retrying — retries cannot manufacture room the budget does not allow.

Exam essentials

Schema vs semantic — constrained decoding eliminates syntax/type/required/enum errors; semantic errors (valid JSON, wrong data) are invisible to the API and need domain logic.
SDK retry loop — validates and re-prompts on mismatch; the result is success (payload in message.structured_output) or error_max_structured_output_retries (fall back). It’s a result you check, not an exception; the retry count is undocumented.
Schema-level semantic hooks — detected_pattern, stated_total vs calculated_total, conflict_detected, nullable “other” + detail, and the provenance triple turn “is it correct?” into a mechanical cross-check.
Three layers — API constrained decoding (schema) + application-code semantic checks + a bounded feedback loop that re-prompts with the specific errors; escalate to human review on exhaustion.
Loop economics — bound the attempts (each retry is a full inference, ~4× for three retries) and keep the schema stable across retries so you don’t re-pay grammar compilation. A truncation (stop_reason: "max_tokens") is the trap retries can’t fix — re-prompting on the same budget re-truncates; detect it and raise the cap instead.