Groq
infrastructureUS · Founded 2016
LPU-based inference. Fastest token generation for open models.
Groq builds Language Processing Units (LPUs), custom silicon designed specifically for transformer inference. The result is token generation speeds 3–10× faster than GPU-based inference for the same models. Operators who need real-time interactive responses use Groq; operators who care about cost generally do not.
Main models
- Llama (via Groq)
- Mixtral (via Groq)
- DeepSeek (via Groq)
Strengths
- Inference speed
- LPU architecture
- Real-time applications
Pricing
Per-token; premium pricing for speed ($0.50–$3 per million tokens typical)