Question 1

Is Qwen 2.5 72B better than Llama 3.1 8B?

Accepted Answer

Qwen 2.5 72B has a higher overall quality score. Qwen 2.5 72B scores 84/100 while Llama 3.1 8B scores 65/100. The best choice depends on your use case, budget, and deployment constraints.

Question 2

Which is cheaper, Qwen 2.5 72B or Llama 3.1 8B?

Accepted Answer

Llama 3.1 8B is cheaper for output tokens. Qwen 2.5 72B starts at $0.90/M output tokens, while Llama 3.1 8B starts at $0.08/M output tokens.

Question 3

How much VRAM do Qwen 2.5 72B and Llama 3.1 8B need?

Accepted Answer

Qwen 2.5 72B requires 145.4 GB (BF16) or 36.4 GB (INT4). Llama 3.1 8B requires 16.1 GB (BF16) or 4.0 GB (INT4). Additional memory is needed for KV-cache and activations.

Question 4

What is the context length of Qwen 2.5 72B vs Llama 3.1 8B?

Accepted Answer

Qwen 2.5 72B supports 131,072 tokens context, while Llama 3.1 8B supports 131,072 tokens.

Provider	Qwen 2.5 72B In $/M	Out $/M	Llama 3.1 8B In $/M	Out $/M
groq	—	—	$0.05	$0.08
together	$0.90	$0.90	$0.18	$0.18
fireworks	$0.90	$0.90	$0.20	$0.20

Qwen 2.5 72B vs Llama 3.1 8B

Architecture Comparison

Memory Requirements

Minimum GPUs Needed (BF16)

Quality Benchmarks

Qwen 2.5 72B

Llama 3.1 8B

Capabilities

API Pricing Comparison

Recommendation Summary

Compare Other Models