Skip to content

Llama 3.1 405B vs Llama 3.1 70B

Meta
Llama 3.1 405B

Meta · 405B params · Quality: 88

Meta
Llama 3.1 70B

Meta · 70.6B params · Quality: 82

Architecture Comparison

SpecLlama 3.1 405BLlama 3.1 70B
TypeDENSEDENSE
Total Parameters405B70.6B
Active Parameters405B70.6B
Layers12680
Hidden Dimension16,3848,192
Attention Heads12864
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionLlama 3.1 405BLlama 3.1 70B
BF16 Weights810.0 GB141.2 GB
FP8 Weights405.0 GB70.6 GB
INT4 Weights202.5 GB35.3 GB
KV-Cache / Token516096 B327680 B
Activation Estimate5.00 GB2.50 GB

Minimum GPUs Needed (BF16)

H100 SXMN/A3 GPUs
L40SN/A4 GPUs

Quality Benchmarks

BenchmarkLlama 3.1 405BLlama 3.1 70B
Overall8882
MMLU88.683.6
HumanEval61.058.5
GSM8K96.893.0
MT-Bench88.085.0

Llama 3.1 405B

MMLU
88.6
HumanEval
61.0
GSM8K
96.8
MT-Bench
88.0

Llama 3.1 70B

MMLU
83.6
HumanEval
58.5
GSM8K
93.0
MT-Bench
85.0

Capabilities

FeatureLlama 3.1 405BLlama 3.1 70B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Llama 3.1 405B)

$3.00/M

Input: $3.00/M

Cheapest Output (Llama 3.1 70B)

$0.79/M

Input: $0.59/M

ProviderLlama 3.1 405B In $/MOut $/MLlama 3.1 70B In $/MOut $/M
groq$0.59$0.79
together$3.50$3.50$0.88$0.88
fireworks$3.00$3.00$0.90$0.90

Recommendation Summary

  • Llama 3.1 405B scores higher on overall quality (88 vs 82).
  • Llama 3.1 70B is cheaper per output token ($0.79/M vs $3.00/M).
  • Llama 3.1 70B has a smaller memory footprint (141.2 GB vs 810.0 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.1 405B is stronger at code generation (HumanEval: 61.0 vs 58.5).
  • Llama 3.1 405B is better at math reasoning (GSM8K: 96.8 vs 93.0).

Compare Other Models