Part IV opens the domain the exam weights at 20% — getting a model to produce what you actually need. The first lever is the cheapest and the most overlooked: the prompt’s own precision. Everything later in this Part (few-shot, structured outputs, validation loops) is an escalation from this baseline. The principle here outlasts every model version — newer models make it more true, not less — which is why this chapter is a stable principle, not a feature surface.
Specify the output format; do not leave it to inference
The single most reliable quality lever is to state the output contract explicitly. “Precisely define your desired output format using JSON, XML, or custom templates so that Claude understands every output formatting element you require.”
[Official]
Increase output consistency · AnthropicT1-official original A vague instruction (“summarize this”) leaves the shape — length, fields, ordering, what to do with missing data — for the model to guess, and a guess varies from run to run. An explicit instruction (“return JSON with keys sentiment, key_issues (list), and action_items (list of objects with team and task)”) removes the variance at its source.
The model will not infer what you did not ask for
Modern models follow instructions more literally, which makes implicit expectations a liability. “Claude Opus 4.8 interprets prompts literally and explicitly, particularly at lower effort levels. It does not silently generalize an instruction from one item to another, and it does not infer requests you didn’t make.” [Official] Prompting best practices · AnthropicT1-official original The upside is precision and less thrash; the cost is that a requirement you held in your head but never wrote down will simply not be honored. The fix is not a cleverer prompt that the model “figures out” — it is stating the requirement.
Tell the model what to do, not what to avoid
When you are steering format or tone, a positive instruction outperforms a prohibition. The docs are explicit that demonstrating the wanted behavior beats forbidding the unwanted one: “Positive examples showing how Claude can communicate with the appropriate level of concision tend to be more effective than negative examples or instructions that tell the model what not to do.” [Official] Prompting best practices: Use examples effectively (multishot / few-shot) · AnthropicT1-official original “Respond in smoothly flowing prose” steers better than “do not use markdown lists,” because a prohibition names a forbidden region without locating the target inside the (still vast) permitted one. A positive instruction points straight at the destination. The same logic applies to eliminating preambles: state “respond directly without preamble” rather than enumerating the openings you dislike. [Official] Prompting best practices · AnthropicT1-official original
The escalation ladder: instruction, then examples, then a hard schema
Explicit instruction is the first rung, not the only one. The documented hierarchy is to ask plainly first and escalate only when you need a stronger guarantee: “Try simply asking the model to conform to your output structure first, as newer models can reliably match complex schemas when told to, especially if implemented with retries. For classification tasks, use either tools with an enum field containing your valid labels or structured outputs.” [Official] Prompting best practices · AnthropicT1-official original Plain instruction handles most cases; a few-shot example (the next chapter) disambiguates edge cases; a hard schema (D4.3) makes a shape unrepresentable to violate. Each rung costs more — context, latency, setup — so you climb only as far as the stakes require.
The hands-on craft of writing these prompts — the iteration rhythm, the worked before-and-after examples — is the Use book’s territory; its prompt-engineering chapter is the use-side companion to this exam-angle treatment (forthcoming in the handbook).
Practice
Exercise solutions
C. The inconsistency is latent in the prompt: “summarize” leaves length, fields, and sentiment-handling unstated, so the model resolves them differently each run. Specifying the exact contract — fixed fields, each with a type and a length bound — removes those degrees of freedom and is what a downstream parser needs. A (temperature) reduces token-level randomness but does nothing about an underspecified shape; a deterministic model still has to invent a structure you never gave it. B is a negative, vague instruction — “too much” is undefined and “be consistent” names the goal without specifying the target. D is the exact assumption modern literal-following models invalidate: a bigger model will not infer a contract you didn’t write, and may follow your vague prompt more faithfully, not less.
A positive rewrite: “Respond directly with the answer in flowing prose.” That single instruction covers both prohibitions — “directly” eliminates the preamble, “flowing prose” rules out headers — by naming the target instead of the forbidden regions. It steers more reliably because a prohibition (“don’t use headers,” “no preamble”) shrinks the output space without locating the destination inside the still-vast permitted region, whereas the positive form aims straight at the one shape you meant.
The three rungs, cheapest first: (1) explicit instruction — name the four labels in the prompt and ask the model to return exactly one; (2) few-shot examples — demonstrate the labeling on ambiguous inputs; (3) structured outputs / strict tools — constrain decoding so only an in-set label can be emitted. Here you should climb to rung 3: the premise is that a malformed or out-of-set label crashes the pipeline, so you need the shape to be unviolatable, not merely likely-correct. An enum-constrained tool or structured output makes an out-of-set value unrepresentable — the only rung that turns “should be valid” into “cannot be invalid,” which is what a crash-on-violation contract demands.
Exam essentials
- Consistency lives in the specification — if two runs disagree, a degree of freedom was left unstated; pin the format (fields, types, lengths, missing-data handling) explicitly.
- Modern models follow instructions literally — Opus 4.8 “does not infer requests you didn’t make,” so an unstated requirement is an unmet one; state it rather than expecting the model to read intent.
- Positive beats negative — “respond in flowing prose” / “answer in one sentence” steers better than “don’t use lists” / “don’t be verbose”; aim at the target instead of ruling out one failure.
- Escalation ladder — explicit instruction → few-shot examples (D4.2) → structured outputs / strict tools (D4.3); climb only as far as the stakes require, since each rung costs more.
- Why stable-principle — “be explicit about what you want” survives every model version; newer, more-literal models make it more load-bearing, not less.