Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

Llama 3.1 405B

Meta · dense · 405B parameters · 131,072 context

Quality

88.0

Calculate ROI →Compare with others Fine-Tune This Model →

Architecture Details

TypeDENSE

Total Parameters405B

Active Parameters405B

Layers126

Hidden Dimension16,384

Attention Heads128

KV Heads8

Head Dimension128

Vocab Size128,256

Memory Requirements

BF16 Weights

810.0 GB

FP8 Weights

405.0 GB

INT4 Weights

202.5 GB

KV-Cache per Token516096 bytes

Activation Estimate5.00 GB

Fits on (single-node)

Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H200 SXMx2 INT4

GPU Recommendations

B200 NVL (pair)optimal

FP8 · 2 GPUs · tensorrt-llm

88/100

score

Throughput

280.0 tok/s

Cost/Month

$19929

Cost/M Tokens

$27.08

Use this config →

H20optimal

FP8 · 8 GPUs · tensorrt-llm

85/100

score

Throughput

280.0 tok/s

Cost/Month

$7516

Cost/M Tokens

$10.21

Use this config →

B200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

83/100

score

Throughput

280.0 tok/s

Cost/Month

$17044

Cost/M Tokens

$23.16

Use this config →

API Pricing Comparison

Provider	Input $/M	Output $/M	Badges
fireworks	$3.00	$3.00	Cheapest
together	$3.50	$3.50

Quality Benchmarks

MMLU

88.6

HumanEval

61.0

GSM8K

96.8

MT-Bench

88.0

Capabilities

Features

✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output

Supported Frameworks

vllmsglangtgitensorrt-llm

Supported Precisions

BF16 (default)FP8INT4

Similar Models

Llama 4 Maverick

400B params · moe

Quality: 89

from $1.80/M

Jamba 1.5 Large

398B params · hybrid

Quality: 50

from $8.00/M

Snowflake Arctic 128x3B

395B params · moe

Quality: 50

Nemotron 340B

340B params · dense

Quality: 85

from $4.20/M

Snowflake Arctic 480B

480B params · moe

Quality: 50

from $1.50/M