Skip to content

DeepSeek V3 vs DeepSeek R1

DeepSeek
DeepSeek V3

DeepSeek · 671B params · Quality: 86

DeepSeek
DeepSeek R1

DeepSeek · 671B params · Quality: 92

Architecture Comparison

SpecDeepSeek V3DeepSeek R1
TypeMOEMOE
Total Parameters671B671B
Active Parameters37B37B
Layers6161
Hidden Dimension7,1687,168
Attention Heads128128
KV Heads11
Context Length131,072131,072
Precision (default)BF16BF16
Total Experts256256
Active Experts88

Memory Requirements

PrecisionDeepSeek V3DeepSeek R1
BF16 Weights1342.0 GB1342.0 GB
FP8 Weights671.0 GB671.0 GB
INT4 Weights335.5 GB335.5 GB
KV-Cache / Token31232 B31232 B
Activation Estimate3.00 GB3.00 GB

Minimum GPUs Needed (BF16)

H100 SXMN/AN/A
L40SN/AN/A

Quality Benchmarks

BenchmarkDeepSeek V3DeepSeek R1
Overall8692
MMLU87.190.8
HumanEval65.071.7
GSM8K89.397.3
MT-Bench87.089.0

DeepSeek V3

MMLU
87.1
HumanEval
65.0
GSM8K
89.3
MT-Bench
87.0

DeepSeek R1

MMLU
90.8
HumanEval
71.7
GSM8K
97.3
MT-Bench
89.0

Capabilities

FeatureDeepSeek V3DeepSeek R1
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✓ Yes
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (DeepSeek V3)

$0.42/M

Input: $0.28/M

Cheapest Output (DeepSeek R1)

$2.19/M

Input: $0.55/M

ProviderDeepSeek V3 In $/MOut $/MDeepSeek R1 In $/MOut $/M
deepseek$0.28$0.42$0.55$2.19
together$0.50$2.80$3.00$7.00

Recommendation Summary

  • DeepSeek R1 scores higher on overall quality (92 vs 86).
  • DeepSeek V3 is cheaper per output token ($0.42/M vs $2.19/M).
  • DeepSeek R1 is stronger at code generation (HumanEval: 71.7 vs 65.0).
  • DeepSeek R1 is better at math reasoning (GSM8K: 97.3 vs 89.3).

Compare Other Models