Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

DeepSeek V3

DeepSeek · moe · 671B parameters · 131,072 context

Quality

86.0

Calculate ROI →Compare with others Fine-Tune This Model →

Architecture Details

TypeMOE

Total Parameters671B

Active Parameters37B

Layers61

Hidden Dimension7,168

Attention Heads128

KV Heads1

Head Dimension128

Vocab Size129,280

Total Experts256

Active Experts8

Memory Requirements

BF16 Weights

1342.0 GB

FP8 Weights

671.0 GB

INT4 Weights

335.5 GB

KV-Cache per Token31232 bytes

Activation Estimate3.00 GB

Fits on (single-node)

Instinct MI325Xx2 INT4B200 NVL (pair)x2 INT4B300x2 INT4Groq LPUx2 INT4B200 SXMx3 INT4B100 SXMx3 INT4GB200 NVL72 (per GPU)x3 INT4GB300 NVL72 (per GPU)x3 INT4

GPU Recommendations

B200 NVL (pair)optimal

FP8 · 4 GPUs · tensorrt-llm

98/100

score

Throughput

140.0 tok/s

Cost/Month

$39858

Cost/M Tokens

$108.33

Use this config →

B200 SXMoptimal

FP8 · 8 GPUs · tensorrt-llm

93/100

score

Throughput

140.0 tok/s

Cost/Month

$34088

Cost/M Tokens

$92.65

Use this config →

H200 SXMoptimal

FP8 · 8 GPUs · tensorrt-llm

90/100

score

Throughput

140.0 tok/s

Cost/Month

$20422

Cost/M Tokens

$55.51

Use this config →

API Pricing Comparison

Provider	Input $/M	Output $/M	Badges
deepseek	$0.28	$0.42	Cheapest
together	$0.50	$2.80

Quality Benchmarks

MMLU

87.1

HumanEval

65.0

GSM8K

89.3

MT-Bench

87.0

Capabilities

Features

✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output

Supported Frameworks

vllmsglangtensorrt-llm

Supported Precisions

BF16 (default)FP8INT4

Similar Models

DeepSeek R1

671B params · moe

Quality: 92

from $2.19/M

Gemini 2.0 Pro

600B params · moe

Quality: 88

from $4.00/M

Grok 3

600B params · moe

Quality: 90

from $15.00/M

Megatron-Turing NLG 530B

530B params · dense

Quality: 58

Snowflake Arctic 480B

480B params · moe

Quality: 50

from $1.50/M