Skip to content

DeepSeek R1 vs DeepSeek V3

DeepSeek
DeepSeek R1

DeepSeek · 671B params · Quality: 92

DeepSeek
DeepSeek V3

DeepSeek · 671B params · Quality: 86

Architecture Comparison

SpecDeepSeek R1DeepSeek V3
TypeMOEMOE
Total Parameters671B671B
Active Parameters37B37B
Layers6161
Hidden Dimension7,1687,168
Attention Heads128128
KV Heads11
Context Length131,072131,072
Precision (default)BF16BF16
Total Experts256256
Active Experts88

Memory Requirements

PrecisionDeepSeek R1DeepSeek V3
BF16 Weights1342.0 GB1342.0 GB
FP8 Weights671.0 GB671.0 GB
INT4 Weights335.5 GB335.5 GB
KV-Cache / Token31232 B31232 B
Activation Estimate3.00 GB3.00 GB

Minimum GPUs Needed (BF16)

H100 SXMN/AN/A
L40SN/AN/A

Quality Benchmarks

BenchmarkDeepSeek R1DeepSeek V3
Overall9286
MMLU90.887.1
HumanEval71.765.0
GSM8K97.389.3
MT-Bench89.087.0

DeepSeek R1

MMLU
90.8
HumanEval
71.7
GSM8K
97.3
MT-Bench
89.0

DeepSeek V3

MMLU
87.1
HumanEval
65.0
GSM8K
89.3
MT-Bench
87.0

Capabilities

FeatureDeepSeek R1DeepSeek V3
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✓ Yes✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (DeepSeek R1)

$2.19/M

Input: $0.55/M

Cheapest Output (DeepSeek V3)

$0.42/M

Input: $0.28/M

ProviderDeepSeek R1 In $/MOut $/MDeepSeek V3 In $/MOut $/M
deepseek$0.55$2.19$0.28$0.42
together$3.00$7.00$0.50$2.80

Recommendation Summary

  • DeepSeek R1 scores higher on overall quality (92 vs 86).
  • DeepSeek V3 is cheaper per output token ($0.42/M vs $2.19/M).
  • DeepSeek R1 is stronger at code generation (HumanEval: 71.7 vs 65.0).
  • DeepSeek R1 is better at math reasoning (GSM8K: 97.3 vs 89.3).

Compare Other Models