Updated minutes ago
Llama 3.1 405B
Meta · dense · 405B parameters · 131,072 context
Quality88.0
Architecture Details
TypeDENSE
Total Parameters405B
Active Parameters405B
Layers126
Hidden Dimension16,384
Attention Heads128
KV Heads8
Head Dimension128
Vocab Size128,256
Memory Requirements
BF16 Weights
810.0 GB
FP8 Weights
405.0 GB
INT4 Weights
202.5 GB
KV-Cache per Token516096 bytes
Activation Estimate5.00 GB
Fits on (single-node)
Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H200 SXMx2 INT4
GPU Recommendations
B200 NVL (pair)optimal
FP8 · 2 GPUs · tensorrt-llm
88/100
score
Throughput
280.0 tok/s
Cost/Month
$19929
Cost/M Tokens
$27.08
H20optimal
FP8 · 8 GPUs · tensorrt-llm
85/100
score
Throughput
280.0 tok/s
Cost/Month
$7516
Cost/M Tokens
$10.21
B200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
83/100
score
Throughput
280.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$23.16
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| fireworks | $3.00 | $3.00 | Cheapest |
| together | $3.50 | $3.50 |
Quality Benchmarks
MMLU88.6
HumanEval61.0
GSM8K96.8
MT-Bench88.0
Capabilities
Features
✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
vllmsglangtgitensorrt-llm
Supported Precisions
BF16 (default)FP8INT4