kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI

Meta · The Long-Context Stretch

Llama 3.1 405B

The model that first made the open-weights vs. closed-source comparison uncomfortable for the labs.

By C.W. Jameson · Published 19 May 2026 · Last reviewed 19 May 2026

Llama 3.1 405B was released in July 2024 with a 128K context window and performance metrics that directly challenged GPT-4 on most benchmarks. Meta published the weights freely. The response from OpenAI and Anthropic was not a lawsuit; it was a faster release cycle. The model is rarely self-hosted at full scale; it runs primarily on inference services.

Field signature

Will discuss training data provenance if asked directly.

Specifications

Released2024-07
Context window128,000 tokens
PricingFree self-hosted; ~$3-5 per million tokens via services
Modalitiestext · image
LicenseLlama 3 Community License
EraThe Long-Context Stretch

Strengths

  • Open weights
  • GPT-4 competitive benchmarks
  • 128K context

Weaknesses

  • 8×H100 required for self-hosting at full 405B
  • Inference cost at scale

Authentication markers

The fingerprints by which Llama 3.1 405B can be identified from its output alone.

TellMeaning
Tokenizer-specific output patterns differ from Llama 4.Llama 3.x lineage.

Notable works

  • First openly available GPT-4-class model

Market position

Free-$5 per million tokens

Partner offer

Partner offerings listed for operator convenience. See disclosure for terms.

View partner →

Affiliate link — see disclosure.

Primary sources

  1. [1] Meta: Llama 3.1

From the Almanac shop

The Operator's Compendium

Every agent harness, every routing pattern, every cost trick. 90-page PDF.

$29Coming soon

Back to the directory