Skip to content

Llama 3.2 1B vs Qwen 2.5 1.5B

Meta
Llama 3.2 1B

Meta · 1.24B params · Quality: 38

Alibaba
Qwen 2.5 1.5B

Alibaba · 1.5B params · Quality: 50

Architecture Comparison

SpecLlama 3.2 1BQwen 2.5 1.5B
TypeDENSEDENSE
Total Parameters1.24B1.5B
Active Parameters1.24B1.5B
Layers1628
Hidden Dimension2,0481,536
Attention Heads3212
KV Heads82
Context Length131,07232,768
Precision (default)BF16BF16

Memory Requirements

PrecisionLlama 3.2 1BQwen 2.5 1.5B
BF16 Weights2.5 GB3.0 GB
FP8 Weights1.2 GB1.5 GB
INT4 Weights0.6 GB0.8 GB
KV-Cache / Token32768 B28672 B
Activation Estimate0.30 GB0.30 GB

Minimum GPUs Needed (BF16)

H100 SXM1 GPU1 GPU
L40S1 GPU1 GPU

Quality Benchmarks

BenchmarkLlama 3.2 1BQwen 2.5 1.5B
Overall3850
MMLU49.3N/A
HumanEval22.0N/A
GSM8K44.4N/A
MT-Bench62.0N/A

Llama 3.2 1B

MMLU
49.3
HumanEval
22.0
GSM8K
44.4
MT-Bench
62.0

Qwen 2.5 1.5B

Capabilities

FeatureLlama 3.2 1BQwen 2.5 1.5B
Tool Use✓ Yes✗ No
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Llama 3.2 1B)

$0.03/M

Input: $0.03/M

Cheapest Output (Qwen 2.5 1.5B)

N/A

ProviderLlama 3.2 1B In $/MOut $/MQwen 2.5 1.5B In $/MOut $/M
together$0.03$0.03
fireworks$0.10$0.10

Recommendation Summary

  • Qwen 2.5 1.5B scores higher on overall quality (50 vs 38).
  • Llama 3.2 1B has a smaller memory footprint (2.5 GB vs 3.0 GB BF16), making it easier to deploy on fewer GPUs.
  • Llama 3.2 1B supports a longer context window (131,072 vs 32,768 tokens).

Compare Other Models