Skip to content
Updated minutes ago
Meta

Llama 4 Behemoth

Meta · moe · 2000B parameters · 1,048,576 context

Quality
93.0

Architecture Details

TypeMOE
Total Parameters2000B
Active Parameters400B
Layers128
Hidden Dimension16,384
Attention Heads128
KV Heads16
Head Dimension128
Vocab Size202,400
Total Experts256
Active Experts16

Memory Requirements

BF16 Weights

4000.0 GB

FP8 Weights

2000.0 GB

INT4 Weights

1000.0 GB

KV-Cache per Token4194304 bytes
Activation Estimate25.00 GB

Fits on (single-node)

B200 NVL (pair)x4 INT4Instinct MI325Xx5 INT4B300x5 INT4Groq LPUx6 INT4B200 SXMx7 INT4B100 SXMx7 INT4GB200 NVL72 (per GPU)x7 INT4GB300 NVL72 (per GPU)x7 INT4

GPU Recommendations

B200 SXMgood

BF16 · 32 GPUs · tensorrt-llm

63/100

score

Throughput

140.0 tok/s

Cost/Month

$136352

Cost/M Tokens

$370.60

Use this config →
B100 SXMgood

BF16 · 32 GPUs · tensorrt-llm

63/100

score

Throughput

140.0 tok/s

Cost/Month

$136656

Cost/M Tokens

$371.43

Use this config →
GB200 NVL72 (per GPU)good

BF16 · 32 GPUs · tensorrt-llm

63/100

score

Throughput

140.0 tok/s

Cost/Month

$197392

Cost/M Tokens

$536.51

Use this config →

API Pricing Comparison

ProviderInput $/MOutput $/MBadges
together$5.00$16.00
Cheapest

Quality Benchmarks

MMLU
92.0
HumanEval
74.0
GSM8K
97.0
MT-Bench
92.0

Capabilities

Features

Tool Use Vision Code Math Reasoning Multilingual Structured Output

Supported Frameworks

Supported Precisions

BF16 (default)

Similar Models