kwj.ai · acquisition inquiries from >$999❦view prospectus →

The Domesday Book ofKWJ · AI

Meta · The Long-Context Stretch

Llama 3.1 405B

The model that first made the open-weights vs. closed-source comparison uncomfortable for the labs.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Llama 3.1 405B was released in July 2024 with a 128K context window and performance metrics that directly challenged GPT-4 on most benchmarks. Meta published the weights freely. The response from OpenAI and Anthropic was not a lawsuit; it was a faster release cycle. The model is rarely self-hosted at full scale; it runs primarily on inference services.

Field signature

Will discuss training data provenance if asked directly.

Specifications

Released	2024-07
Context window	128,000 tokens
Pricing	Free self-hosted; ~$3-5 per million tokens via services
Modalities	text · image
License	Llama 3 Community License
Era	The Long-Context Stretch

Strengths

Open weights
GPT-4 competitive benchmarks
128K context

Weaknesses

8×H100 required for self-hosting at full 405B
Inference cost at scale

Authentication markers

The fingerprints by which Llama 3.1 405B can be identified from its output alone.

Tell	Meaning
Tokenizer-specific output patterns differ from Llama 4.	Llama 3.x lineage.

Notable works

First openly available GPT-4-class model

Market position

Free-$5 per million tokens

Partner offer

Partner offerings listed for operator convenience. See disclosure for terms.

View partner →

Affiliate link — see disclosure.

Primary sources

[1] Meta: Llama 3.1

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29 — Coming soon

← Back to the directory