The first three chapters of Part IV controlled a single response. This one scales to a hundred thousand of them. A batch is the right tool whenever the work is large and nothing is waiting on the answer — and its surface (the endpoint, the size limits, the custom_id rule, the beta header) is exactly the kind of named detail that shifts between releases, so this is a feature-surface chapter.

The cost-latency trade

The Message Batches API exists for one trade: give up immediacy, get half off. “The Message Batches API is a powerful, cost-effective way to asynchronously process large volumes of Messages requests. This approach is well-suited to tasks that do not require immediate responses, with most batches finishing in less than 1 hour while reducing costs by 50% and increasing throughput.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original The discount is a flat 50% on both input and output across all tiers, and it stacks with prompt-caching discounts. The cost of the discount is a service-level agreement measured in hours, not milliseconds: most batches finish within an hour, but the guarantee is 24, and a batch that does not complete in 24 hours expires. [Official] Batch processing (Message Batches API) · AnthropicT1-official original Results stay retrievable for 29 days after creation.

The custom_id contract

A batch is a set, not a sequence, and that has one non-negotiable consequence: “Batch results can be returned in any order, and may not match the ordering of requests when the batch was created. … To correctly match results with their corresponding requests, always use the custom_id field.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original Every request carries a unique custom_id (1–64 characters, alphanumeric plus - and _), and that id is the only thread connecting an output back to the input that produced it.

The batch envelope: what fits and what it can’t do

A batch is bounded by size and by shape. Size: “A Message Batch is limited to either 100,000 Message requests or 256 MB in size, whichever is reached first.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original Exceed the payload and the create call returns HTTP 413 — break huge datasets into multiple batches. Shape: a batch supports all Messages API features including beta features, “however, streaming is not supported for batch requests,” [Official] Batch processing (Message Batches API) · AnthropicT1-official original and each request is single-shot — there is no follow-up turn inside a batch, so multi-turn tool round-trips do not work. Structured outputs (D4.3), by contrast, compose cleanly: a batched request can carry output_config.format and you get schema-valid JSON at 50% off. [Official] Structured outputs · AnthropicT1-official original

Billing, result types, and the lifecycle

You pay only for what works: a result is succeeded, errored, canceled, or expired, and “you are not billed for errored, canceled, or expired requests.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original For unusually long generations there is an opt-in: the output-300k-2026-03-24 beta header “raises the max_tokens cap to 300,000 for batch requests using Claude Opus 4.8, Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6” — batch-only, and a single 300k generation can itself take over an hour, so submit it with the 24-hour window in mind. [Official] Batch processing (Message Batches API) · AnthropicT1-official original

Practice

Exercise solutions

Solution ↑ Exercise

B. The job is large, offline, and cost-sensitive with no one waiting — the exact profile batch is built for: 50% off, and 80,000 requests sits within the 100,000-request limit. Matching by custom_id after the batch ends is the required pattern because results return unordered. A works but forfeits the 50% discount and adds rate-limit and orchestration overhead for latency nobody needs. C is impossible — streaming is not supported for batch requests. D collapses 80,000 independent classifications into one prompt, which blows past context limits and produces a single entangled response with no per-ticket structure.

Solution ↑ Exercise

custom_id is mandatory because batch results “can be returned in any order” — a batch is a set, not a sequence, so there is no positional correspondence between the request list and the result stream to fall back on. The unique custom_id is the only thread joining an output back to the input that produced it. If a caller instead assumes submission order, the specific failure is a silent mis-join: result n is attributed to request n when it actually answers some other request, so records carry the wrong data and nothing in the response flags it. That is the most dangerous failure class — one that corrupts data without surfacing an error.

Solution ↑ Exercise

The two limits are 100,000 requests or 256 MB in size, whichever is reached first; the 256 MB payload limit is the one an HTTP 413 reports on creation (the fix is to split the dataset into multiple batches). A Messages API capability that does not work inside a batch: streaming (explicitly unsupported), or equally a multi-turn tool loop — each batched request is single-shot, with no tool_result round-trip, because a batch processes each request as one independent user→assistant turn with no follow-up.

Exam essentials