Skip to content

Qwen 2.5 72B vs Llama 3.1 8B

Alibaba
Qwen 2.5 72B

Alibaba · 72.7B params · Quality: 84

Meta
Llama 3.1 8B

Meta · 8.03B params · Quality: 65

Architecture Comparison

SpecQwen 2.5 72BLlama 3.1 8B
TypeDENSEDENSE
Total Parameters72.7B8.03B
Active Parameters72.7B8.03B
Layers8032
Hidden Dimension8,1924,096
Attention Heads6432
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionQwen 2.5 72BLlama 3.1 8B
BF16 Weights145.4 GB16.1 GB
FP8 Weights72.7 GB8.0 GB
INT4 Weights36.4 GB4.0 GB
KV-Cache / Token327680 B131072 B
Activation Estimate2.50 GB1.00 GB

Minimum GPUs Needed (BF16)

H100 SXM3 GPUs1 GPU
L40S4 GPUs1 GPU

Quality Benchmarks

BenchmarkQwen 2.5 72BLlama 3.1 8B
Overall8465
MMLU85.369.4
HumanEval56.040.2
GSM8K91.679.6
MT-Bench86.078.0

Qwen 2.5 72B

MMLU
85.3
HumanEval
56.0
GSM8K
91.6
MT-Bench
86.0

Llama 3.1 8B

MMLU
69.4
HumanEval
40.2
GSM8K
79.6
MT-Bench
78.0

Capabilities

FeatureQwen 2.5 72BLlama 3.1 8B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Qwen 2.5 72B)

$0.90/M

Input: $0.90/M

Cheapest Output (Llama 3.1 8B)

$0.08/M

Input: $0.05/M

ProviderQwen 2.5 72B In $/MOut $/MLlama 3.1 8B In $/MOut $/M
groq$0.05$0.08
together$0.90$0.90$0.18$0.18
fireworks$0.90$0.90$0.20$0.20

Recommendation Summary

  • Qwen 2.5 72B scores higher on overall quality (84 vs 65).
  • Llama 3.1 8B is cheaper per output token ($0.08/M vs $0.90/M).
  • Llama 3.1 8B has a smaller memory footprint (16.1 GB vs 145.4 GB BF16), making it easier to deploy on fewer GPUs.
  • Qwen 2.5 72B is stronger at code generation (HumanEval: 56.0 vs 40.2).
  • Qwen 2.5 72B is better at math reasoning (GSM8K: 91.6 vs 79.6).

Compare Other Models