Updated minutes ago
Mixtral 8x22B
Mistral AI · moe · 141B parameters · 65,536 context
Quality73.0
Architecture Details
TypeMOE
Total Parameters141B
Active Parameters39B
Layers56
Hidden Dimension6,144
Attention Heads48
KV Heads8
Head Dimension128
Vocab Size32,768
Total Experts8
Active Experts2
Memory Requirements
BF16 Weights
282.0 GB
FP8 Weights
141.0 GB
INT4 Weights
70.5 GB
KV-Cache per Token229376 bytes
Activation Estimate2.50 GB
Fits on (single-node)
B200 SXM FP8B100 SXM FP8GB200 NVL72 (per GPU) FP8GB300 NVL72 (per GPU) FP8H200 SXM INT4H100 NVL INT4H20 INT4H100 NVL 94GB (per GPU pair) FP8
GPU Recommendations
B100 SXMoptimal
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$4271
Cost/M Tokens
$5.80
GB200 NVL72 (per GPU)optimal
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$6169
Cost/M Tokens
$8.38
GB300 NVL72 (per GPU)optimal
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$7118
Cost/M Tokens
$9.67
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| together | $1.20 | $1.20 | Cheapest |
| mistral | $2.00 | $6.00 |
Quality Benchmarks
MMLU77.8
HumanEval46.0
GSM8K78.4
MT-Bench80.0
Capabilities
Features
✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
vllmsglangtgitensorrt-llm
Supported Precisions
BF16 (default)FP8INT4