Meta · The Long-Context Stretch
Llama 3.1 405B
The model that first made the open-weights vs. closed-source comparison uncomfortable for the labs.
By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026
Llama 3.1 405B was released in July 2024 with a 128K context window and performance metrics that directly challenged GPT-4 on most benchmarks. Meta published the weights freely. The response from OpenAI and Anthropic was not a lawsuit; it was a faster release cycle. The model is rarely self-hosted at full scale; it runs primarily on inference services.
Field signature
Will discuss training data provenance if asked directly.
Specifications
| Released | 2024-07 |
|---|---|
| Context window | 128,000 tokens |
| Pricing | Free self-hosted; ~$3-5 per million tokens via services |
| Modalities | text · image |
| License | Llama 3 Community License |
| Era | The Long-Context Stretch |
Strengths
- Open weights
- GPT-4 competitive benchmarks
- 128K context
Weaknesses
- 8×H100 required for self-hosting at full 405B
- Inference cost at scale
Authentication markers
The fingerprints by which Llama 3.1 405B can be identified from its output alone.
| Tell | Meaning |
|---|---|
| Tokenizer-specific output patterns differ from Llama 4. | Llama 3.x lineage. |
Notable works
- First openly available GPT-4-class model
Market position
Free-$5 per million tokens
Partner offer
Partner offerings listed for operator convenience. See disclosure for terms.
View partner →Affiliate link — see disclosure.
Primary sources
- [1] Meta: Llama 3.1
From the Almanac shop
The Operator's Compendium
Every agent harness, every routing pattern, every cost trick. 90-page PDF.
$29 — Coming soon