Extended thinking: when to spend reasoning tokens and when not

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Extended thinking, deep thought, reasoning effort: every frontier lab now ships a knob that trades latency and cost for capability. Used well, it solves problems no amount of prompt engineering would. Used poorly, it triples your bill on tasks that didn't need it. The discipline is in knowing which is which.

When to enable

Hard math, multi-step planning, novel algorithm derivation, debugging when the obvious cause is wrong. Anything where the first answer is consistently wrong and the second answer is consistently right.

When to skip

Classification, summarisation, well-trodden boilerplate, anything a smaller model already does fine. Reasoning effort doesn't help these tasks and quietly bills you for invisible tokens.

Anthropic extended thinking

Off by default. Enable with the extended-thinking parameter and a budget. Visible thinking-blocks in the response.

OpenAI reasoning-effort

Three tiers: low, medium, high. High can spend 10-20× the visible output in reasoning tokens. Tier-low is closer to the older models.

Tells

Marker	Meaning
Bill rose 5-10× without obvious volume change	Reasoning effort accidentally left high.
First attempts wrong, second attempts right on same prompt	Candidate for enabling extended thinking.

Frequently asked

Can I see what the model was thinking?

Anthropic exposes thinking blocks. OpenAI reveals only summaries. Both treat raw reasoning as sensitive.

Does extended thinking improve every task?

No. Below a complexity floor, it adds latency without changing answers.

From the Almanac shop

Model Tells — Flashcard Deck

Identify any frontier model from a paragraph of output. 60 cards.

$14 — Coming soon

← All identification topics