kwj.ai · acquisition inquiries from >$999❦view prospectus →

The Domesday Book ofKWJ · AI

Meta · The Agentic Era

Llama 4

Meta's fourth Llama generation. Three sizes, all-MoE, the open-weights default for serious self-hosters.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Llama 4 is the model most operators end up self-hosting when they need either privacy, cost control, or fine-tuning. The MoE architecture means VRAM footprint is lower than parameter-count suggests; serious inference setups run the 400B-class model on 8×H100 hardware that two years ago could not have run a 70B dense model.

Field signature

Sometimes leaks the system prompt verbatim on jailbreak attempts.

Specifications

Released	2026
Context window	128,000 - 1,000,000 tokens depending on variant
Pricing	Free if self-hosted
Modalities	text · image
License	Llama 4 Community License (commercial OK below 700M MAU)
Era	The Agentic Era

Strengths

Open weights
Fine-tuning supported
Wide ecosystem

Weaknesses

Refusal training brittle compared to frontier closed models
Tool-use trails Claude and GPT in reliability

Authentication markers

The fingerprints by which Llama 4 can be identified from its output alone.

Tell	Meaning
Tokenizer-specific quirks on Korean and Arabic.	Llama-family tokenizer.

Notable works

Backbone for thousands of fine-tunes on Hugging Face

Market position

Free self-hosted; ~$0.20 - $3 per million tokens on inference services

Partner offer

Partner offerings listed for operator convenience. See disclosure for terms.

View partner →

Affiliate link — see disclosure.

Primary sources

[1] Meta: Llama

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29 — Coming soon

← Back to the directory