Skip to content

Llama 3.1 8B vs Llama 3.1 405B

Meta
Llama 3.1 8B

Meta · 8.03B params · Quality: 65

Meta
Llama 3.1 405B

Meta · 405B params · Quality: 88

Architecture Comparison

SpecLlama 3.1 8BLlama 3.1 405B
TypeDENSEDENSE
Total Parameters8.03B405B
Active Parameters8.03B405B
Layers32126
Hidden Dimension4,09616,384
Attention Heads32128
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionLlama 3.1 8BLlama 3.1 405B
BF16 Weights16.1 GB810.0 GB
FP8 Weights8.0 GB405.0 GB
INT4 Weights4.0 GB202.5 GB
KV-Cache / Token131072 B516096 B
Activation Estimate1.00 GB5.00 GB

Minimum GPUs Needed (BF16)

H100 SXM1 GPUN/A
L40S1 GPUN/A

Quality Benchmarks

BenchmarkLlama 3.1 8BLlama 3.1 405B
Overall6588
MMLU69.488.6
HumanEval40.261.0
GSM8K79.696.8
MT-Bench78.088.0

Llama 3.1 8B

MMLU
69.4
HumanEval
40.2
GSM8K
79.6
MT-Bench
78.0

Llama 3.1 405B

MMLU
88.6
HumanEval
61.0
GSM8K
96.8
MT-Bench
88.0

Capabilities

FeatureLlama 3.1 8BLlama 3.1 405B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Llama 3.1 8B)

$0.08/M

Input: $0.05/M

Cheapest Output (Llama 3.1 405B)

$3.00/M

Input: $3.00/M

ProviderLlama 3.1 8B In $/MOut $/MLlama 3.1 405B In $/MOut $/M
groq$0.05$0.08
together$0.18$0.18$3.50$3.50
fireworks$0.20$0.20$3.00$3.00

Recommendation Summary

  • Llama 3.1 405B scores higher on overall quality (88 vs 65).
  • Llama 3.1 8B is cheaper per output token ($0.08/M vs $3.00/M).
  • Llama 3.1 8B has a smaller memory footprint (16.1 GB vs 810.0 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.1 405B is stronger at code generation (HumanEval: 61.0 vs 40.2).
  • Llama 3.1 405B is better at math reasoning (GSM8K: 96.8 vs 79.6).

Compare Other Models