Tool-use schemas and the JSON-adherence problem

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Tool-use is the difference between a chatbot and a runtime. The frontier models all support it; their adherence-rates differ by an order of magnitude. The cheapest way to measure is a forced-format test — give the model a schema, give it a prompt that should produce one tool call, and run it a thousand times.

Schema declarations

OpenAI and Anthropic both support JSON schema for tool definitions. The schemas are interoperable in form but not semantics. A required field that Anthropic enforces strictly may be tolerated as optional by OpenAI on the same JSON.

Adherence rates

Claude Sonnet 4.6 produces well-formed JSON on first attempt at >99%. GPT-4o is closer to 95%. Open-weights vary widely; some need a structured-output library wrapped around them to reach 90%.

Hallucinated fields

Models will sometimes invent fields not in the schema, particularly when the prompt suggests one. Strict-mode flags help; explicit schemas help more.

Parallel tool calls

Claude and GPT both support emitting multiple tool calls in a single turn. Latency-sensitive harnesses depend on this. Open-weights often emit calls sequentially even when parallel is allowed.

Tells

Marker	Meaning
Model returns prose with embedded JSON instead of structured tool call	Either tool-use is misconfigured, or the model is small enough not to support it natively.
Schema-required field arrives as null repeatedly	Prompt is not unambiguously asking for it.

Frequently asked

Does temperature affect tool-use adherence?

Yes — set to 0 for production tool-use unless you have a reason otherwise.

Can I trust open-weights tool-use without a wrapper?

Increasingly yes for Llama 4 and Qwen 3, but a wrapper library is still safer.

From the Almanac shop

Model Tells — Flashcard Deck

Identify any frontier model from a paragraph of output. 60 cards.

$14 — Coming soon

← All identification topics