DeepSeek V3 vs DeepSeek R1
Architecture Comparison
SpecDeepSeek V3DeepSeek R1
TypeMOEMOE
Total Parameters671B671B
Active Parameters37B37B
Layers6161
Hidden Dimension7,1687,168
Attention Heads128128
KV Heads11
Context Length131,072131,072
Precision (default)BF16BF16
Total Experts256256
Active Experts88
Memory Requirements
PrecisionDeepSeek V3DeepSeek R1
BF16 Weights1342.0 GB1342.0 GB
FP8 Weights671.0 GB671.0 GB
INT4 Weights335.5 GB335.5 GB
KV-Cache / Token31232 B31232 B
Activation Estimate3.00 GB3.00 GB
Minimum GPUs Needed (BF16)
H100 SXMN/AN/A
L40SN/AN/A
Quality Benchmarks
BenchmarkDeepSeek V3DeepSeek R1
Overall8692
MMLU87.190.8
HumanEval65.071.7
GSM8K89.397.3
MT-Bench87.089.0
DeepSeek V3
MMLU
87.1
HumanEval
65.0
GSM8K
89.3
MT-Bench
87.0
DeepSeek R1
MMLU
90.8
HumanEval
71.7
GSM8K
97.3
MT-Bench
89.0
Capabilities
FeatureDeepSeek V3DeepSeek R1
Tool Use✓ Yes✓ Yes
Vision✗ No✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✓ Yes
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes
API Pricing Comparison
Cheapest Output (DeepSeek V3)
$0.42/M
Input: $0.28/M
Cheapest Output (DeepSeek R1)
$2.19/M
Input: $0.55/M
| Provider | DeepSeek V3 In $/M | Out $/M | DeepSeek R1 In $/M | Out $/M |
|---|---|---|---|---|
| deepseek | $0.28 | $0.42 | $0.55 | $2.19 |
| together | $0.50 | $2.80 | $3.00 | $7.00 |
Recommendation Summary
- ‣DeepSeek R1 scores higher on overall quality (92 vs 86).
- ‣DeepSeek V3 is cheaper per output token ($0.42/M vs $2.19/M).
- ‣DeepSeek R1 is stronger at code generation (HumanEval: 71.7 vs 65.0).
- ‣DeepSeek R1 is better at math reasoning (GSM8K: 97.3 vs 89.3).