Volume IV · The Guides
Operator guides
Written for the engineer or analyst who has to choose, deploy, debug, or budget a model. Each piece carries the perspective of having actually done the work.
How to pick an LLM for your workload
A decision tree that takes thirty minutes to walk and saves six months of switching costs.
Selection
Architecting prompts for 10× cost reduction with caching
Anthropic's prompt cache saves 90% on cache hits. The architecture that earns those hits looks deliberate.
Cost
Building an agent loop you can read in a single sitting
Every operator eventually writes their own harness. Here is the shape of a maintainable one.
Engineering
Model routing: running cheap when you can, expensive when you must
Routing turns a $20/day workload into a $4/day workload without losing capability.
Cost
Evaluating an agent the way operators actually do
Capability benchmarks measure capability. Operators want to measure deployability. The two are different.
Operations
Self-hosting open-weights: when it pays and when it doesn't
Self-hosting Llama or Qwen makes sense at the scale where you stop counting dollars and start counting hours.
Operations
Prompt-engineering vs fine-tuning: the breakeven
Most prompt-engineering problems are not fine-tuning problems. The reverse is also true.
Engineering
Structured outputs and the JSON-adherence problem
Getting reliable JSON out of an LLM is now table-stakes. The pitfalls are real.
Engineering
Computer-use agents: current reliability and where to use them
Anthropic's Computer Use and OpenAI's Operator both work. Both are slower and less reliable than purpose-built tools. Pick deliberately.
Agents
Claude Code vs Cursor: operator's comparison
Two coding agents, two philosophies. Pick by the kind of session, not the kind of operator.
Tools