Refusals: what models won't do and why

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Refusals are the most operationally-consequential difference between frontier models. A model that refuses 5% of legitimate requests is not interchangeable with one that refuses 0.5%. The data on this is poor; most operators discover the differences by hitting them.

Anthropic constitutional AI

Claude refuses with specific alternatives. 'I can help with X but not Y' is a Claude-family signature. Refusal training is more conservative on personally-identifying topics, more permissive on technical depth.

OpenAI RLHF

GPT family refuses with vaguer language. Higher refusal rate on creative-writing prompts that touch sensitive themes. Tool-use can sometimes bypass conversational refusals.

Gemini

Refusals tend to come as warnings followed by partial answers rather than full declines. Multimodal inputs occasionally bypass text-only refusal training.

Grok

Lowest refusal rate of the four frontier families. Marketed explicitly on this.

Open-weights

Refusal training is brittle. Jailbreaks succeed at higher rates. For workloads that need permissiveness, this is a feature; for workloads that need predictable safety, it is a bug.

Tells

Marker	Meaning
Refusal with specific alternative ('I can do X but not Y')	Claude family.
Refusal with disclaimers wrapping a partial answer	Gemini family.
Almost no refusals at all	Grok or an uncensored open-weights derivative.

Frequently asked

Does the API refuse less than the consumer product?

Generally yes; system prompts on consumer products add a refusal layer above the API.

From the Almanac shop

Model Tells — Flashcard Deck

Identify any frontier model from a paragraph of output. 60 cards.

$14 — Coming soon

← All identification topics