Research·8 min

Mixture of Experts: Why Mixtral Changed Efficiency Assumptions

By C.W. Jameson · Published 20 April 2025 · Last reviewed 20 May 2025

MoE models activate only a fraction of parameters per token. The efficiency gains are real and the tradeoffs are specific.

How mixture-of-experts architectures work, why Mixtral surprised the field, and what MoE means for model deployment.

Related dispatches