Field identification
Tool-use schemas and the JSON-adherence problem
Every frontier model claims tool-use support. Adherence-rate on schemas tells a different story.
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
Tool-use is the difference between a chatbot and a runtime. The frontier models all support it; their adherence-rates differ by an order of magnitude. The cheapest way to measure is a forced-format test — give the model a schema, give it a prompt that should produce one tool call, and run it a thousand times.
Schema declarations
OpenAI and Anthropic both support JSON schema for tool definitions. The schemas are interoperable in form but not semantics. A required field that Anthropic enforces strictly may be tolerated as optional by OpenAI on the same JSON.
Adherence rates
Claude Sonnet 4.6 produces well-formed JSON on first attempt at >99%. GPT-4o is closer to 95%. Open-weights vary widely; some need a structured-output library wrapped around them to reach 90%.
Hallucinated fields
Models will sometimes invent fields not in the schema, particularly when the prompt suggests one. Strict-mode flags help; explicit schemas help more.
Parallel tool calls
Claude and GPT both support emitting multiple tool calls in a single turn. Latency-sensitive harnesses depend on this. Open-weights often emit calls sequentially even when parallel is allowed.
Tells
| Marker | Meaning |
|---|---|
| Model returns prose with embedded JSON instead of structured tool call | Either tool-use is misconfigured, or the model is small enough not to support it natively. |
| Schema-required field arrives as null repeatedly | Prompt is not unambiguously asking for it. |
Frequently asked
Does temperature affect tool-use adherence?
Yes — set to 0 for production tool-use unless you have a reason otherwise.
Can I trust open-weights tool-use without a wrapper?
Increasingly yes for Llama 4 and Qwen 3, but a wrapper library is still safer.
From the Almanac shop
Model Tells — Flashcard Deck
Identify any frontier model from a paragraph of output. 60 cards.
$14 — Coming soon