kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

Meta · The Agentic Era

Llama 4

Meta's fourth Llama generation. Three sizes, all-MoE, the open-weights default for serious self-hosters.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Llama 4 is the model most operators end up self-hosting when they need either privacy, cost control, or fine-tuning. The MoE architecture means VRAM footprint is lower than parameter-count suggests; serious inference setups run the 400B-class model on 8×H100 hardware that two years ago could not have run a 70B dense model.

Field signature

Sometimes leaks the system prompt verbatim on jailbreak attempts.

Specifications

Released2026
Context window128,000 - 1,000,000 tokens depending on variant
PricingFree if self-hosted
Modalitiestext · image
LicenseLlama 4 Community License (commercial OK below 700M MAU)
EraThe Agentic Era

Strengths

  • Open weights
  • Fine-tuning supported
  • Wide ecosystem

Weaknesses

  • Refusal training brittle compared to frontier closed models
  • Tool-use trails Claude and GPT in reliability

Authentication markers

The fingerprints by which Llama 4 can be identified from its output alone.

TellMeaning
Tokenizer-specific quirks on Korean and Arabic.Llama-family tokenizer.

Notable works

  • Backbone for thousands of fine-tunes on Hugging Face

Market position

Free self-hosted; ~$0.20 - $3 per million tokens on inference services

Partner offer

Partner offerings listed for operator convenience. See disclosure for terms.

View partner →

Affiliate link — see disclosure.

Primary sources

  1. [1] Meta: Llama

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29Coming soon

Back to the directory