Field identification
Extended thinking: when to spend reasoning tokens and when not
Reasoning effort is a knob, not a switch. The cost-to-correctness curve is real and operators should sketch it before turning it up.
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
Extended thinking, deep thought, reasoning effort: every frontier lab now ships a knob that trades latency and cost for capability. Used well, it solves problems no amount of prompt engineering would. Used poorly, it triples your bill on tasks that didn't need it. The discipline is in knowing which is which.
When to enable
Hard math, multi-step planning, novel algorithm derivation, debugging when the obvious cause is wrong. Anything where the first answer is consistently wrong and the second answer is consistently right.
When to skip
Classification, summarisation, well-trodden boilerplate, anything a smaller model already does fine. Reasoning effort doesn't help these tasks and quietly bills you for invisible tokens.
Anthropic extended thinking
Off by default. Enable with the extended-thinking parameter and a budget. Visible thinking-blocks in the response.
OpenAI reasoning-effort
Three tiers: low, medium, high. High can spend 10-20× the visible output in reasoning tokens. Tier-low is closer to the older models.
Tells
| Marker | Meaning |
|---|---|
| Bill rose 5-10× without obvious volume change | Reasoning effort accidentally left high. |
| First attempts wrong, second attempts right on same prompt | Candidate for enabling extended thinking. |
Frequently asked
Can I see what the model was thinking?
Anthropic exposes thinking blocks. OpenAI reveals only summaries. Both treat raw reasoning as sensitive.
Does extended thinking improve every task?
No. Below a complexity floor, it adds latency without changing answers.
From the Almanac shop
Model Tells — Flashcard Deck
Identify any frontier model from a paragraph of output. 60 cards.
$14 — Coming soon