Updated minutes ago
Snowflake Arctic 128x3B
Snowflake · moe · 395B parameters · 4,096 context
Quality50.0
Architecture Details
TypeMOE
Total Parameters395B
Active Parameters17B
Layers35
Hidden Dimension7,168
Attention Heads56
KV Heads8
Head Dimension128
Vocab Size32,000
Total Experts128
Active Experts2
Memory Requirements
BF16 Weights
790.0 GB
FP8 Weights
395.0 GB
INT4 Weights
197.5 GB
KV-Cache per Token143360 bytes
Activation Estimate2.00 GB
Fits on (single-node)
Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H200 SXMx2 INT4
GPU Recommendations
B200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$23.16
B100 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$17082
Cost/M Tokens
$23.21
H200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$10211
Cost/M Tokens
$13.88
API Pricing Comparison
No API pricing data available for this model.
Capabilities
Features
✗ Tool Use✗ Vision✓ Code✗ Math✗ Reasoning✗ Multilingual✓ Structured Output
Supported Frameworks
vllmsglang
Supported Precisions
BF16 (default)FP8INT4