Skip to content
Updated minutes ago
Meta

Llama 4 Maverick

Meta · moe · 400B parameters · 1,048,576 context

Quality
89.0

Architecture Details

TypeMOE
Total Parameters400B
Active Parameters17B
Layers96
Hidden Dimension5,120
Attention Heads40
KV Heads8
Head Dimension128
Vocab Size202,048
Total Experts128
Active Experts1

Memory Requirements

BF16 Weights

800.0 GB

FP8 Weights

400.0 GB

INT4 Weights

200.0 GB

KV-Cache per Token393216 bytes
Activation Estimate3.00 GB

Fits on (single-node)

Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H200 SXMx2 INT4

GPU Recommendations

B200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

100/100

score

Throughput

280.0 tok/s

Cost/Month

$17044

Cost/M Tokens

$23.16

Use this config →
B100 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

100/100

score

Throughput

280.0 tok/s

Cost/Month

$17082

Cost/M Tokens

$23.21

Use this config →
H200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

100/100

score

Throughput

280.0 tok/s

Cost/Month

$10211

Cost/M Tokens

$13.88

Use this config →

API Pricing Comparison

ProviderInput $/MOutput $/MBadges
together$1.20$1.80
Cheapest
fireworks$1.50$2.00

Quality Benchmarks

MMLU
89.0
HumanEval
63.0
GSM8K
95.0
MT-Bench
88.0

Capabilities

Features

Tool Use Vision Code Math Reasoning Multilingual Structured Output

Supported Frameworks

vllmsglangtensorrt-llm

Supported Precisions

BF16 (default)FP8INT4

Similar Models