Skip to content

Llama 3.1 70B vs Llama 3.1 405B

Meta
Llama 3.1 70B

Meta · 70.6B params · Quality: 82

Meta
Llama 3.1 405B

Meta · 405B params · Quality: 88

Architecture Comparison

SpecLlama 3.1 70BLlama 3.1 405B
TypeDENSEDENSE
Total Parameters70.6B405B
Active Parameters70.6B405B
Layers80126
Hidden Dimension8,19216,384
Attention Heads64128
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionLlama 3.1 70BLlama 3.1 405B
BF16 Weights141.2 GB810.0 GB
FP8 Weights70.6 GB405.0 GB
INT4 Weights35.3 GB202.5 GB
KV-Cache / Token327680 B516096 B
Activation Estimate2.50 GB5.00 GB

Minimum GPUs Needed (BF16)

H100 SXM3 GPUsN/A
L40S4 GPUsN/A

Quality Benchmarks

BenchmarkLlama 3.1 70BLlama 3.1 405B
Overall8288
MMLU83.688.6
HumanEval58.561.0
GSM8K93.096.8
MT-Bench85.088.0

Llama 3.1 70B

MMLU
83.6
HumanEval
58.5
GSM8K
93.0
MT-Bench
85.0

Llama 3.1 405B

MMLU
88.6
HumanEval
61.0
GSM8K
96.8
MT-Bench
88.0

Capabilities

FeatureLlama 3.1 70BLlama 3.1 405B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Llama 3.1 70B)

$0.79/M

Input: $0.59/M

Cheapest Output (Llama 3.1 405B)

$3.00/M

Input: $3.00/M

ProviderLlama 3.1 70B In $/MOut $/MLlama 3.1 405B In $/MOut $/M
groq$0.59$0.79
together$0.88$0.88$3.50$3.50
fireworks$0.90$0.90$3.00$3.00

Recommendation Summary

  • Llama 3.1 405B scores higher on overall quality (88 vs 82).
  • Llama 3.1 70B is cheaper per output token ($0.79/M vs $3.00/M).
  • Llama 3.1 70B has a smaller memory footprint (141.2 GB vs 810.0 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.1 405B is stronger at code generation (HumanEval: 61.0 vs 58.5).
  • Llama 3.1 405B is better at math reasoning (GSM8K: 96.8 vs 93.0).

Compare Other Models