Updated minutes ago
Llama 4 Maverick
Meta · moe · 400B parameters · 1,048,576 context
Quality89.0
Architecture Details
TypeMOE
Total Parameters400B
Active Parameters17B
Layers96
Hidden Dimension5,120
Attention Heads40
KV Heads8
Head Dimension128
Vocab Size202,048
Total Experts128
Active Experts1
Memory Requirements
BF16 Weights
800.0 GB
FP8 Weights
400.0 GB
INT4 Weights
200.0 GB
KV-Cache per Token393216 bytes
Activation Estimate3.00 GB
Fits on (single-node)
Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H200 SXMx2 INT4
GPU Recommendations
B200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$23.16
B100 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$17082
Cost/M Tokens
$23.21
H200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$10211
Cost/M Tokens
$13.88
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| together | $1.20 | $1.80 | Cheapest |
| fireworks | $1.50 | $2.00 |
Quality Benchmarks
MMLU89.0
HumanEval63.0
GSM8K95.0
MT-Bench88.0
Capabilities
Features
✓ Tool Use✓ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
vllmsglangtensorrt-llm
Supported Precisions
BF16 (default)FP8INT4