Updated minutes ago
DeepSeek V3
DeepSeek · moe · 671B parameters · 131,072 context
Quality86.0
Architecture Details
TypeMOE
Total Parameters671B
Active Parameters37B
Layers61
Hidden Dimension7,168
Attention Heads128
KV Heads1
Head Dimension128
Vocab Size129,280
Total Experts256
Active Experts8
Memory Requirements
BF16 Weights
1342.0 GB
FP8 Weights
671.0 GB
INT4 Weights
335.5 GB
KV-Cache per Token31232 bytes
Activation Estimate3.00 GB
Fits on (single-node)
Instinct MI325Xx2 INT4B200 NVL (pair)x2 INT4B300x2 INT4Groq LPUx2 INT4B200 SXMx3 INT4B100 SXMx3 INT4GB200 NVL72 (per GPU)x3 INT4GB300 NVL72 (per GPU)x3 INT4
GPU Recommendations
B200 NVL (pair)optimal
FP8 · 4 GPUs · tensorrt-llm
98/100
score
Throughput
140.0 tok/s
Cost/Month
$39858
Cost/M Tokens
$108.33
B200 SXMoptimal
FP8 · 8 GPUs · tensorrt-llm
93/100
score
Throughput
140.0 tok/s
Cost/Month
$34088
Cost/M Tokens
$92.65
H200 SXMoptimal
FP8 · 8 GPUs · tensorrt-llm
90/100
score
Throughput
140.0 tok/s
Cost/Month
$20422
Cost/M Tokens
$55.51
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| deepseek | $0.28 | $0.42 | Cheapest |
| together | $0.50 | $2.80 |
Quality Benchmarks
MMLU87.1
HumanEval65.0
GSM8K89.3
MT-Bench87.0
Capabilities
Features
✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
vllmsglangtensorrt-llm
Supported Precisions
BF16 (default)FP8INT4