Skip to content

Qwen 2.5 72B vs Llama 3.1 70B

Alibaba
Qwen 2.5 72B

Alibaba · 72.7B params · Quality: 84

Meta
Llama 3.1 70B

Meta · 70.6B params · Quality: 82

Architecture Comparison

SpecQwen 2.5 72BLlama 3.1 70B
TypeDENSEDENSE
Total Parameters72.7B70.6B
Active Parameters72.7B70.6B
Layers8080
Hidden Dimension8,1928,192
Attention Heads6464
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionQwen 2.5 72BLlama 3.1 70B
BF16 Weights145.4 GB141.2 GB
FP8 Weights72.7 GB70.6 GB
INT4 Weights36.4 GB35.3 GB
KV-Cache / Token327680 B327680 B
Activation Estimate2.50 GB2.50 GB

Minimum GPUs Needed (BF16)

H100 SXM3 GPUs3 GPUs
L40S4 GPUs4 GPUs

Quality Benchmarks

BenchmarkQwen 2.5 72BLlama 3.1 70B
Overall8482
MMLU85.383.6
HumanEval56.058.5
GSM8K91.693.0
MT-Bench86.085.0

Qwen 2.5 72B

MMLU
85.3
HumanEval
56.0
GSM8K
91.6
MT-Bench
86.0

Llama 3.1 70B

MMLU
83.6
HumanEval
58.5
GSM8K
93.0
MT-Bench
85.0

Capabilities

FeatureQwen 2.5 72BLlama 3.1 70B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Qwen 2.5 72B)

$0.90/M

Input: $0.90/M

Cheapest Output (Llama 3.1 70B)

$0.79/M

Input: $0.59/M

ProviderQwen 2.5 72B In $/MOut $/MLlama 3.1 70B In $/MOut $/M
groq$0.59$0.79
together$0.90$0.90$0.88$0.88
fireworks$0.90$0.90$0.90$0.90

Recommendation Summary

  • Qwen 2.5 72B scores higher on overall quality (84 vs 82).
  • Llama 3.1 70B is cheaper per output token ($0.79/M vs $0.90/M).
  • Llama 3.1 70B has a smaller memory footprint (141.2 GB vs 145.4 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.1 70B is stronger at code generation (HumanEval: 58.5 vs 56.0).
  • Llama 3.1 70B is better at math reasoning (GSM8K: 93.0 vs 91.6).

Compare Other Models