Field identification
Refusals: what models won't do and why
Each frontier family has a different refusal training. A workload that one model refuses, another will attempt; a workload one attempts, another will quietly hallucinate.
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
Refusals are the most operationally-consequential difference between frontier models. A model that refuses 5% of legitimate requests is not interchangeable with one that refuses 0.5%. The data on this is poor; most operators discover the differences by hitting them.
Anthropic constitutional AI
Claude refuses with specific alternatives. 'I can help with X but not Y' is a Claude-family signature. Refusal training is more conservative on personally-identifying topics, more permissive on technical depth.
OpenAI RLHF
GPT family refuses with vaguer language. Higher refusal rate on creative-writing prompts that touch sensitive themes. Tool-use can sometimes bypass conversational refusals.
Gemini
Refusals tend to come as warnings followed by partial answers rather than full declines. Multimodal inputs occasionally bypass text-only refusal training.
Grok
Lowest refusal rate of the four frontier families. Marketed explicitly on this.
Open-weights
Refusal training is brittle. Jailbreaks succeed at higher rates. For workloads that need permissiveness, this is a feature; for workloads that need predictable safety, it is a bug.
Tells
| Marker | Meaning |
|---|---|
| Refusal with specific alternative ('I can do X but not Y') | Claude family. |
| Refusal with disclaimers wrapping a partial answer | Gemini family. |
| Almost no refusals at all | Grok or an uncensored open-weights derivative. |
Frequently asked
Does the API refuse less than the consumer product?
Generally yes; system prompts on consumer products add a refusal layer above the API.
From the Almanac shop
Model Tells — Flashcard Deck
Identify any frontier model from a paragraph of output. 60 cards.
$14 — Coming soon