DeepSeek R1 vs DeepSeek V3
Architecture Comparison
SpecDeepSeek R1DeepSeek V3
TypeMOEMOE
Total Parameters671B671B
Active Parameters37B37B
Layers6161
Hidden Dimension7,1687,168
Attention Heads128128
KV Heads11
Context Length131,072131,072
Precision (default)BF16BF16
Total Experts256256
Active Experts88
Memory Requirements
PrecisionDeepSeek R1DeepSeek V3
BF16 Weights1342.0 GB1342.0 GB
FP8 Weights671.0 GB671.0 GB
INT4 Weights335.5 GB335.5 GB
KV-Cache / Token31232 B31232 B
Activation Estimate3.00 GB3.00 GB
Minimum GPUs Needed (BF16)
H100 SXMN/AN/A
L40SN/AN/A
Quality Benchmarks
BenchmarkDeepSeek R1DeepSeek V3
Overall9286
MMLU90.887.1
HumanEval71.765.0
GSM8K97.389.3
MT-Bench89.087.0
DeepSeek R1
MMLU
90.8
HumanEval
71.7
GSM8K
97.3
MT-Bench
89.0
DeepSeek V3
MMLU
87.1
HumanEval
65.0
GSM8K
89.3
MT-Bench
87.0
Capabilities
FeatureDeepSeek R1DeepSeek V3
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✓ Yes✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes
API Pricing Comparison
Cheapest Output (DeepSeek R1)
$2.19/M
Input: $0.55/M
Cheapest Output (DeepSeek V3)
$0.42/M
Input: $0.28/M
| Provider | DeepSeek R1 In $/M | Out $/M | DeepSeek V3 In $/M | Out $/M |
|---|---|---|---|---|
| deepseek | $0.55 | $2.19 | $0.28 | $0.42 |
| together | $3.00 | $7.00 | $0.50 | $2.80 |
Recommendation Summary
- ‣DeepSeek R1 scores higher on overall quality (92 vs 86).
- ‣DeepSeek V3 is cheaper per output token ($0.42/M vs $2.19/M).
- ‣DeepSeek R1 is stronger at code generation (HumanEval: 71.7 vs 65.0).
- ‣DeepSeek R1 is better at math reasoning (GSM8K: 97.3 vs 89.3).