Skip to content

Llama 3.1 70B vs Qwen 2.5 72B

Meta
Llama 3.1 70B

Meta · 70.6B params · Quality: 82

Alibaba
Qwen 2.5 72B

Alibaba · 72.7B params · Quality: 84

Architecture Comparison

SpecLlama 3.1 70BQwen 2.5 72B
TypeDENSEDENSE
Total Parameters70.6B72.7B
Active Parameters70.6B72.7B
Layers8080
Hidden Dimension8,1928,192
Attention Heads6464
KV Heads88
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionLlama 3.1 70BQwen 2.5 72B
BF16 Weights141.2 GB145.4 GB
FP8 Weights70.6 GB72.7 GB
INT4 Weights35.3 GB36.4 GB
KV-Cache / Token327680 B327680 B
Activation Estimate2.50 GB2.50 GB

Minimum GPUs Needed (BF16)

H100 SXM3 GPUs3 GPUs
L40S4 GPUs4 GPUs

Quality Benchmarks

BenchmarkLlama 3.1 70BQwen 2.5 72B
Overall8284
MMLU83.685.3
HumanEval58.556.0
GSM8K93.091.6
MT-Bench85.086.0

Llama 3.1 70B

MMLU
83.6
HumanEval
58.5
GSM8K
93.0
MT-Bench
85.0

Qwen 2.5 72B

MMLU
85.3
HumanEval
56.0
GSM8K
91.6
MT-Bench
86.0

Capabilities

FeatureLlama 3.1 70BQwen 2.5 72B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Llama 3.1 70B)

$0.79/M

Input: $0.59/M

Cheapest Output (Qwen 2.5 72B)

$0.90/M

Input: $0.90/M

ProviderLlama 3.1 70B In $/MOut $/MQwen 2.5 72B In $/MOut $/M
groq$0.59$0.79
together$0.88$0.88$0.90$0.90
fireworks$0.90$0.90$0.90$0.90

Recommendation Summary

  • Qwen 2.5 72B scores higher on overall quality (84 vs 82).
  • Llama 3.1 70B is cheaper per output token ($0.79/M vs $0.90/M).
  • Llama 3.1 70B has a smaller memory footprint (141.2 GB vs 145.4 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.1 70B is stronger at code generation (HumanEval: 58.5 vs 56.0).
  • Llama 3.1 70B is better at math reasoning (GSM8K: 93.0 vs 91.6).

Compare Other Models