Updated minutes ago
Phi 3.5 MoE
Microsoft · moe · 41.9B parameters · 131,072 context
Quality74.0
Architecture Details
TypeMOE
Total Parameters41.9B
Active Parameters6.6B
Layers32
Hidden Dimension4,096
Attention Heads32
KV Heads8
Head Dimension128
Vocab Size32,064
Total Experts16
Active Experts2
Memory Requirements
BF16 Weights
83.8 GB
FP8 Weights
41.9 GB
INT4 Weights
20.9 GB
KV-Cache per Token131072 bytes
Activation Estimate1.00 GB
Fits on (single-node)
B200 SXM BF16B100 SXM BF16GB200 NVL72 (per GPU) BF16GB300 NVL72 (per GPU) BF16H200 SXM BF16H100 SXM FP8H100 PCIe FP8H100 NVL FP8
GPU Recommendations
B200 SXMoptimal
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Cost/Month
$4261
Cost/M Tokens
$1.54
B100 SXMoptimal
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Cost/Month
$4271
Cost/M Tokens
$1.55
GB200 NVL72 (per GPU)optimal
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Cost/Month
$6169
Cost/M Tokens
$2.24
API Pricing Comparison
No API pricing data available for this model.
Quality Benchmarks
MMLU78.9
HumanEval52.0
GSM8K84.0
MT-Bench81.0
Capabilities
Features
✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
vllmsglangtgitensorrt-llm
Supported Precisions
BF16 (default)FP8INT4