Skip to content

Qwen 2.5 7B vs Qwen 2.5 72B

Alibaba
Qwen 2.5 7B

Alibaba · 7.6B params · Quality: 70

Alibaba
Qwen 2.5 72B

Alibaba · 72.7B params · Quality: 84

Architecture Comparison

SpecQwen 2.5 7BQwen 2.5 72B
TypeDENSEDENSE
Total Parameters7.6B72.7B
Active Parameters7.6B72.7B
Layers2880
Hidden Dimension3,5848,192
Attention Heads2864
KV Heads48
Context Length131,072131,072
Precision (default)BF16BF16

Memory Requirements

PrecisionQwen 2.5 7BQwen 2.5 72B
BF16 Weights15.2 GB145.4 GB
FP8 Weights7.6 GB72.7 GB
INT4 Weights3.8 GB36.4 GB
KV-Cache / Token57344 B327680 B
Activation Estimate1.00 GB2.50 GB

Minimum GPUs Needed (BF16)

H100 SXM1 GPU3 GPUs
L40S1 GPU4 GPUs

Quality Benchmarks

BenchmarkQwen 2.5 7BQwen 2.5 72B
Overall7084
MMLU74.285.3
HumanEval42.856.0
GSM8K82.091.6
MT-Bench79.086.0

Qwen 2.5 7B

MMLU
74.2
HumanEval
42.8
GSM8K
82.0
MT-Bench
79.0

Qwen 2.5 72B

MMLU
85.3
HumanEval
56.0
GSM8K
91.6
MT-Bench
86.0

Capabilities

FeatureQwen 2.5 7BQwen 2.5 72B
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Qwen 2.5 7B)

$0.20/M

Input: $0.20/M

Cheapest Output (Qwen 2.5 72B)

$0.90/M

Input: $0.90/M

ProviderQwen 2.5 7B In $/MOut $/MQwen 2.5 72B In $/MOut $/M
together$0.20$0.20$0.90$0.90
fireworks$0.20$0.20$0.90$0.90

Recommendation Summary

  • Qwen 2.5 72B scores higher on overall quality (84 vs 70).
  • Qwen 2.5 7B is cheaper per output token ($0.20/M vs $0.90/M).
  • Qwen 2.5 7B has a smaller memory footprint (15.2 GB vs 145.4 GB BF16), making it easier to deploy on fewer GPUs.
  • Qwen 2.5 72B is stronger at code generation (HumanEval: 56.0 vs 42.8).
  • Qwen 2.5 72B is better at math reasoning (GSM8K: 91.6 vs 82.0).

Compare Other Models