Groq

infrastructure

US · Founded 2016

LPU-based inference. Fastest token generation for open models.

Groq builds Language Processing Units (LPUs), custom silicon designed specifically for transformer inference. The result is token generation speeds 3–10× faster than GPU-based inference for the same models. Operators who need real-time interactive responses use Groq; operators who care about cost generally do not.

Main models

Llama (via Groq)
Mixtral (via Groq)
DeepSeek (via Groq)

Strengths

Inference speed
LPU architecture
Real-time applications

Pricing

Per-token; premium pricing for speed ($0.50–$3 per million tokens typical)

Official site →