Skip to content

Llama 3.1 405B vs Llama 3.1 8B

Meta
Llama 3.1 405B

Meta · 405B params · Quality: 88

Meta
Llama 3.1 8B

Meta · 8.03B params · Quality: 65

Architecture Comparison

SpecLlama 3.1 405BLlama 3.1 8B
TypeDENSEDENSE
Total Parameters405B8.03B
Active Parameters405B8.03B
Layers12632
Hidden Dimension16,3844,096
Attention Heads12832
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionLlama 3.1 405BLlama 3.1 8B
BF16 Weights810.0 GB16.1 GB
FP8 Weights405.0 GB8.0 GB
INT4 Weights202.5 GB4.0 GB
KV-Cache / Token516096 B131072 B
Activation Estimate5.00 GB1.00 GB

Minimum GPUs Needed (BF16)

H100 SXMN/A1 GPU
L40SN/A1 GPU

Quality Benchmarks

BenchmarkLlama 3.1 405BLlama 3.1 8B
Overall8865
MMLU88.669.4
HumanEval61.040.2
GSM8K96.879.6
MT-Bench88.078.0

Llama 3.1 405B

MMLU
88.6
HumanEval
61.0
GSM8K
96.8
MT-Bench
88.0

Llama 3.1 8B

MMLU
69.4
HumanEval
40.2
GSM8K
79.6
MT-Bench
78.0

Capabilities

FeatureLlama 3.1 405BLlama 3.1 8B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Llama 3.1 405B)

$3.00/M

Input: $3.00/M

Cheapest Output (Llama 3.1 8B)

$0.08/M

Input: $0.05/M

ProviderLlama 3.1 405B In $/MOut $/MLlama 3.1 8B In $/MOut $/M
groq$0.05$0.08
together$3.50$3.50$0.18$0.18
fireworks$3.00$3.00$0.20$0.20

Recommendation Summary

  • Llama 3.1 405B scores higher on overall quality (88 vs 65).
  • Llama 3.1 8B is cheaper per output token ($0.08/M vs $3.00/M).
  • Llama 3.1 8B has a smaller memory footprint (16.1 GB vs 810.0 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.1 405B is stronger at code generation (HumanEval: 61.0 vs 40.2).
  • Llama 3.1 405B is better at math reasoning (GSM8K: 96.8 vs 79.6).

Compare Other Models