Skip to content
Updated minutes ago
Meta

Llama 3.1 405B

Meta · dense · 405B parameters · 131,072 context

Quality
88.0

Architecture Details

TypeDENSE
Total Parameters405B
Active Parameters405B
Layers126
Hidden Dimension16,384
Attention Heads128
KV Heads8
Head Dimension128
Vocab Size128,256

Memory Requirements

BF16 Weights

810.0 GB

FP8 Weights

405.0 GB

INT4 Weights

202.5 GB

KV-Cache per Token516096 bytes
Activation Estimate5.00 GB

Fits on (single-node)

Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H200 SXMx2 INT4

GPU Recommendations

B200 NVL (pair)optimal

FP8 · 2 GPUs · tensorrt-llm

88/100

score

Throughput

280.0 tok/s

Cost/Month

$19929

Cost/M Tokens

$27.08

Use this config →
H20optimal

FP8 · 8 GPUs · tensorrt-llm

85/100

score

Throughput

280.0 tok/s

Cost/Month

$7516

Cost/M Tokens

$10.21

Use this config →
B200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

83/100

score

Throughput

280.0 tok/s

Cost/Month

$17044

Cost/M Tokens

$23.16

Use this config →

API Pricing Comparison

ProviderInput $/MOutput $/MBadges
fireworks$3.00$3.00
Cheapest
together$3.50$3.50

Quality Benchmarks

MMLU
88.6
HumanEval
61.0
GSM8K
96.8
MT-Bench
88.0

Capabilities

Features

Tool Use Vision Code Math Reasoning Multilingual Structured Output

Supported Frameworks

vllmsglangtgitensorrt-llm

Supported Precisions

BF16 (default)FP8INT4

Similar Models