The first three chapters of Part IV controlled a single response. This one scales to a hundred thousand of them. A batch is the right tool whenever the work is large and nothing is waiting on the answer — and its surface (the endpoint, the size limits, the custom_id rule, the beta header) is exactly the kind of named detail that shifts between releases, so this is a feature-surface chapter.

Do I know this already? Diagnostic

Answer these confidently and you can skim ahead to Exam essentials; if any is shaky, read closely — each is developed below.

What single factor decides batch versus real-time, and which way does each tolerance point?
Why is custom_id mandatory, and what specifically breaks if you rely on result order?
Name the two size limits on a single batch, and which one an HTTP 413 reports.
A batch result comes back succeeded — does that mean its answer is usable? What must you still check?
Which result types are not billed, and which billed-but-possibly-useless outcome is the trap?

Check your answers

Latency tolerance alone decides: if a human or synchronous system is blocked on the result, batch is wrong; if the work is an overnight job, backfill, or offline evaluation, batch halves the bill.
Results “can be returned in any order,” so the unique custom_id is the only sanctioned join key; relying on positional order silently mismatches outputs to inputs, and nothing in the response flags it.
100,000 requests or 256 MB, whichever is reached first — an HTTP 413 on creation reports the 256 MB payload limit.
No — succeeded is a batch-level outcome that says the request ran; you must still inspect the message’s own stop_reason, because a refusal ("refusal") or truncation ("max_tokens") arrives as succeeded.
errored, canceled, and expired are not billed; the trap is a succeeded refusal — it returns a 200, you pay for it, and it may not match your schema.

The cost-latency trade

The Message Batches API exists for one trade: give up immediacy, get half off. “The Message Batches API is a powerful, cost-effective way to asynchronously process large volumes of Messages requests. This approach is well-suited to tasks that do not require immediate responses, with most batches finishing in less than 1 hour while reducing costs by 50% and increasing throughput.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original The discount is a flat 50% on both input and output across all tiers, and it stacks with prompt-caching discounts. The cost of the discount is a service-level agreement measured in hours, not milliseconds: most batches finish within an hour, but the guarantee is 24, and a batch that does not complete in 24 hours expires. [Official] Batch processing (Message Batches API) · AnthropicT1-official original Results stay retrievable for 29 days after creation.

The custom_id contract

A batch is a set, not a sequence, and that has one non-negotiable consequence: “Batch results can be returned in any order, and may not match the ordering of requests when the batch was created. … To correctly match results with their corresponding requests, always use the custom_id field.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original Every request carries a unique custom_id (1–64 characters, alphanumeric plus - and _), and that id is the only thread connecting an output back to the input that produced it.

The batch envelope: what fits and what it can’t do

A batch is bounded by size and by shape. Size: “A Message Batch is limited to either 100,000 Message requests or 256 MB in size, whichever is reached first.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original Exceed the payload and the create call returns HTTP 413 — break huge datasets into multiple batches. Shape: a batch supports all Messages API features including beta features, “however, streaming is not supported for batch requests,” [Official] Batch processing (Message Batches API) · AnthropicT1-official original and each request is single-shot — there is no follow-up turn inside a batch, so multi-turn tool round-trips do not work. Structured outputs (D4.3), by contrast, compose cleanly: a batched request can carry output_config.format and you get schema-valid JSON at 50% off. [Official] Structured outputs · AnthropicT1-official original

Billing, result types, and the lifecycle

You pay only for what works: a result is succeeded, errored, canceled, or expired, and “you are not billed for errored, canceled, or expired requests.” [Official] Batch processing (Message Batches API) · AnthropicT1-official original For unusually long generations there is an opt-in: the output-300k-2026-03-24 beta header “raises the max_tokens cap to 300,000 for batch requests using Claude Opus 4.8, Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6” — batch-only, and a single 300k generation can itself take over an hour, so submit it with the 24-hour window in mind. [Official] Batch processing (Message Batches API) · AnthropicT1-official original

A succeeded result is not automatically a usable one

The four result types — succeeded, errored, canceled, expired — are batch-level outcomes: they tell you the request ran, not that its answer is good. A succeeded result “includes the message result,” [Official] Batch processing (Message Batches API) · AnthropicT1-official original and that message carries its own stop_reason. Two values still bite — a refusal (stop_reason: "refusal") returns a 200, is billed, and may not match your schema; a truncation (stop_reason: "max_tokens") is incomplete output. [Official] Structured outputs · AnthropicT1-official original Note the cost asymmetry: you are not billed for errored/canceled/expired, but a succeeded refusal you do pay for. So per-result handling must inspect each succeeded message’s stop_reason, not stop at the result type.

The lifecycle, with the check most callers skip Worked example

Classifying 80,000 tickets overnight. Two layers of result-checking, not one:

# 1. Create -- each request a unique custom_id (the only join key).
batch = client.messages.batches.create(requests=[
    {"custom_id": f"ticket-{t.id}", "params": {...}} for t in tickets])

# 2. Poll until ended (most < 1h; SLA 24h, then expiry).
while client.messages.batches.retrieve(batch.id).processing_status != "ended":
    sleep(60)

# 3. Stream JSONL -- order is NOT guaranteed.
for r in client.messages.batches.results(batch.id):
    if r.result.type != "succeeded":
        handle_failure(r.custom_id, r.result.type)        # errored/canceled/expired -- unbilled
        continue
    msg = r.result.message
    if msg.stop_reason in ("refusal", "max_tokens"):
        handle_unusable(r.custom_id, msg.stop_reason)      # succeeded but NOT usable -- and billed
        continue
    records[r.custom_id] = parse(msg)                       # 4. join by custom_id

The structure is the lesson. The first guard is the batch-level result type — succeeded versus the three unbilled failures. The second, the one most pipelines omit, is the message-level stop_reason inside a succeeded result: a refusal or a truncation reaches you as succeeded yet carries no answer you can use — and you paid for the refusal. Skip it and you silently ingest refused/truncated outputs as if they were classifications. And throughout, custom_id is the only correct join key, because the result stream is unordered.

Practice

Exercise solutions

Solution ↑ Exercise

B. The job is large, offline, and cost-sensitive with no one waiting — the exact profile batch is built for: 50% off, and 80,000 requests sits within the 100,000-request limit. Matching by custom_id after the batch ends is the required pattern because results return unordered. A works but forfeits the 50% discount and adds rate-limit and orchestration overhead for latency nobody needs. C is impossible — streaming is not supported for batch requests. D collapses 80,000 independent classifications into one prompt, which blows past context limits and produces a single entangled response with no per-ticket structure.

Solution ↑ Exercise

custom_id is mandatory because batch results “can be returned in any order” — a batch is a set, not a sequence, so there is no positional correspondence between the request list and the result stream to fall back on. The unique custom_id is the only thread joining an output back to the input that produced it. If a caller instead assumes submission order, the specific failure is a silent mis-join: result n is attributed to request n when it actually answers some other request, so records carry the wrong data and nothing in the response flags it. That is the most dangerous failure class — one that corrupts data without surfacing an error.

Solution ↑ Exercise

The two limits are 100,000 requests or 256 MB in size, whichever is reached first; the 256 MB payload limit is the one an HTTP 413 reports on creation (the fix is to split the dataset into multiple batches). A Messages API capability that does not work inside a batch: streaming (explicitly unsupported), or equally a multi-turn tool loop — each batched request is single-shot, with no tool_result round-trip, because a batch processes each request as one independent user→assistant turn with no follow-up.

Exam essentials

The trade — batch is async and 50% off (input and output, all tiers, stacks with caching); most finish under an hour, the SLA is 24 hours, then the batch expires; results retained 29 days. Choose it by latency tolerance alone.
custom_id contract — results return in any order; match by the unique custom_id (1–64 chars, alphanumeric + - + _). Never rely on positional order; never reuse an id.
Envelope — 100,000 requests or 256 MB per batch (HTTP 413 over payload); streaming unsupported; each request is single-shot (no multi-turn tool loop). Structured outputs compose (schema-valid at 50% off).
Billing + beta — billed only for succeeded; errored/canceled/expired are free; the output-300k-2026-03-24 beta raises max_tokens to 300,000 on batch for Opus 4.8/4.7/4.6 and Sonnet 4.6.
Succeeded ≠ usable — a succeeded result still carries a per-message stop_reason; a refusal ("refusal", 200, billed, may not match schema) or a truncation ("max_tokens", incomplete) reaches you as succeeded. Check each succeeded message’s stop_reason, not just the result type.
Lifecycle — POST /v1/messages/batches → poll until ended → stream results_url JSONL → match by custom_id → optional DELETE before the 29-day window.