kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

Field identification

Pricing models and how to forecast a bill

Per-token pricing is the simple part. Reasoning tokens, cached prefixes, and tool-call billing are where surprises live.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

The published pricing on a model card is rarely what an operator pays. Reasoning tokens, cached-prefix discounts, batch-API discounts, output-token surcharges, image-token rates, and PDF-page rates all stack. A correctly-engineered prompt can be five times cheaper than a naively-engineered one against the same model.

Base token pricing

Every frontier model publishes input and output rates per million tokens. Output is typically 4-5× input. This is the headline number.

Reasoning tokens

OpenAI o-series and GPT-5 bill reasoning tokens separately, at output rates. A high-reasoning-effort prompt can spend 10-20× the visible output in invisible reasoning. Budget for it explicitly.

Prompt caching

Anthropic offers cache reads at 10% of normal input rate when a prompt prefix is reused within 5 minutes. The 5-minute TTL drives architecture decisions: a long-running agent that idles past 5 minutes pays full price next turn.

Batch API

Both Anthropic and OpenAI offer ~50% discounts on batched workloads with a 24-hour SLA. Classification and enrichment workloads should default to batch unless realtime is required.

Image and PDF rates

Images are billed as token equivalents, varying by resolution. PDFs are billed per page. A 200-page PDF analysis can cost more than the operator expects by an order of magnitude.

Tells

MarkerMeaning
Bill jumped 10× without obvious volume changeReasoning tokens enabled, or cache TTL was missed.
Long sessions cost more after a pauseCache expiry; either keep the session warm or accept the cache miss.

Frequently asked

Does prompt caching work across providers?

No. Each provider's cache is local. Multi-provider routing forfeits cache savings.

When is batch API the wrong choice?

Anything user-facing, anything below a 24-hour SLA, anything where prompt errors need fast iteration.

From the Almanac shop

Model Tells — Flashcard Deck

Identify any frontier model from a paragraph of output. 60 cards.

$14Coming soon

All identification topics