Field identification
Pricing models and how to forecast a bill
Per-token pricing is the simple part. Reasoning tokens, cached prefixes, and tool-call billing are where surprises live.
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
The published pricing on a model card is rarely what an operator pays. Reasoning tokens, cached-prefix discounts, batch-API discounts, output-token surcharges, image-token rates, and PDF-page rates all stack. A correctly-engineered prompt can be five times cheaper than a naively-engineered one against the same model.
Base token pricing
Every frontier model publishes input and output rates per million tokens. Output is typically 4-5× input. This is the headline number.
Reasoning tokens
OpenAI o-series and GPT-5 bill reasoning tokens separately, at output rates. A high-reasoning-effort prompt can spend 10-20× the visible output in invisible reasoning. Budget for it explicitly.
Prompt caching
Anthropic offers cache reads at 10% of normal input rate when a prompt prefix is reused within 5 minutes. The 5-minute TTL drives architecture decisions: a long-running agent that idles past 5 minutes pays full price next turn.
Batch API
Both Anthropic and OpenAI offer ~50% discounts on batched workloads with a 24-hour SLA. Classification and enrichment workloads should default to batch unless realtime is required.
Image and PDF rates
Images are billed as token equivalents, varying by resolution. PDFs are billed per page. A 200-page PDF analysis can cost more than the operator expects by an order of magnitude.
Tells
| Marker | Meaning |
|---|---|
| Bill jumped 10× without obvious volume change | Reasoning tokens enabled, or cache TTL was missed. |
| Long sessions cost more after a pause | Cache expiry; either keep the session warm or accept the cache miss. |
Frequently asked
Does prompt caching work across providers?
No. Each provider's cache is local. Multi-provider routing forfeits cache savings.
When is batch API the wrong choice?
Anything user-facing, anything below a 24-hour SLA, anything where prompt errors need fast iteration.
From the Almanac shop
Model Tells — Flashcard Deck
Identify any frontier model from a paragraph of output. 60 cards.
$14 — Coming soon