kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

Cost · 9 min

Architecting prompts for 10× cost reduction with caching

Anthropic's prompt cache saves 90% on cache hits. The architecture that earns those hits looks deliberate.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Prompt caching is the single highest-impact cost optimisation in production LLM workloads. The architectural pattern is simple but easy to violate by accident. This guide is the version of that pattern that operators teach each other.

What gets cached

Anthropic caches the longest prefix of a prompt that ends at a cache_control breakpoint. The cache lives 5 minutes. Reads against a cache cost 10% of input rate; writes cost 25% above input rate. Breakeven is roughly two reads per write.

Where to put the breakpoint

System prompt, tool definitions, large documents, and example transcripts go above the breakpoint. The current user turn and any dynamic state — timestamps, session IDs, transient flags — go below.

The most common mistake is putting a timestamp or session ID in the system prompt for 'context'. Each call then invalidates the cache, and the operator pays full input rate for content that should have been 10%.

Multi-breakpoint structure

You can declare up to four breakpoints. The right structure for long-document workloads is: system → tools → static documents (breakpoint) → dynamic context (breakpoint) → user turn.

Cache misses in production

Watch the cache_read_input_tokens and cache_creation_input_tokens fields in the API response. A healthy long-running agent has cache_read 80% of input tokens. Below 50% means the cache architecture is fighting itself.

Frequently asked

Does the cache survive across requests?

Yes, within the 5-minute TTL window. Across processes, across machines, as long as the prefix is byte-identical.

Can I cache across providers?

No.

Partner offer

Anthropic's Claude family is the model lineage most operators end up on for serious agent work. The free tier remains useful.

Try Claude →

Affiliate link — see disclosure.

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29Coming soon

All guides