Updated minutes ago
Llama 4 Behemoth
Meta · moe · 2000B parameters · 1,048,576 context
Quality93.0
Architecture Details
TypeMOE
Total Parameters2000B
Active Parameters400B
Layers128
Hidden Dimension16,384
Attention Heads128
KV Heads16
Head Dimension128
Vocab Size202,400
Total Experts256
Active Experts16
Memory Requirements
BF16 Weights
4000.0 GB
FP8 Weights
2000.0 GB
INT4 Weights
1000.0 GB
KV-Cache per Token4194304 bytes
Activation Estimate25.00 GB
Fits on (single-node)
B200 NVL (pair)x4 INT4Instinct MI325Xx5 INT4B300x5 INT4Groq LPUx6 INT4B200 SXMx7 INT4B100 SXMx7 INT4GB200 NVL72 (per GPU)x7 INT4GB300 NVL72 (per GPU)x7 INT4
GPU Recommendations
B200 SXMgood
BF16 · 32 GPUs · tensorrt-llm
63/100
score
Throughput
140.0 tok/s
Cost/Month
$136352
Cost/M Tokens
$370.60
B100 SXMgood
BF16 · 32 GPUs · tensorrt-llm
63/100
score
Throughput
140.0 tok/s
Cost/Month
$136656
Cost/M Tokens
$371.43
GB200 NVL72 (per GPU)good
BF16 · 32 GPUs · tensorrt-llm
63/100
score
Throughput
140.0 tok/s
Cost/Month
$197392
Cost/M Tokens
$536.51
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| together | $5.00 | $16.00 | Cheapest |
Quality Benchmarks
MMLU92.0
HumanEval74.0
GSM8K97.0
MT-Bench92.0
Capabilities
Features
✓ Tool Use✓ Vision✓ Code✓ Math✓ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
Supported Precisions
BF16 (default)