Meta · The Agentic Era
Llama 4
Meta's fourth Llama generation. Three sizes, all-MoE, the open-weights default for serious self-hosters.
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
Llama 4 is the model most operators end up self-hosting when they need either privacy, cost control, or fine-tuning. The MoE architecture means VRAM footprint is lower than parameter-count suggests; serious inference setups run the 400B-class model on 8×H100 hardware that two years ago could not have run a 70B dense model.
Field signature
Sometimes leaks the system prompt verbatim on jailbreak attempts.
Specifications
| Released | 2026 |
|---|---|
| Context window | 128,000 - 1,000,000 tokens depending on variant |
| Pricing | Free if self-hosted |
| Modalities | text · image |
| License | Llama 4 Community License (commercial OK below 700M MAU) |
| Era | The Agentic Era |
Strengths
- Open weights
- Fine-tuning supported
- Wide ecosystem
Weaknesses
- Refusal training brittle compared to frontier closed models
- Tool-use trails Claude and GPT in reliability
Authentication markers
The fingerprints by which Llama 4 can be identified from its output alone.
| Tell | Meaning |
|---|---|
| Tokenizer-specific quirks on Korean and Arabic. | Llama-family tokenizer. |
Notable works
- Backbone for thousands of fine-tunes on Hugging Face
Market position
Free self-hosted; ~$0.20 - $3 per million tokens on inference services
Partner offer
Partner offerings listed for operator convenience. See disclosure for terms.
View partner →Affiliate link — see disclosure.
Primary sources
- [1] Meta: Llama
From the Almanac shop
The Operator's Compendium
Every agent harness, every routing pattern, every cost trick. 90-page PDF.
$29 — Coming soon