Updated minutes ago
Qwen 3 235B
Alibaba · moe · 235B parameters · 131,072 context
Quality88.0
Architecture Details
TypeMOE
Total Parameters235B
Active Parameters22B
Layers94
Hidden Dimension5,120
Attention Heads64
KV Heads4
Head Dimension128
Vocab Size151,936
Total Experts128
Active Experts8
Memory Requirements
BF16 Weights
470.0 GB
FP8 Weights
235.0 GB
INT4 Weights
117.5 GB
KV-Cache per Token192512 bytes
Activation Estimate3.00 GB
Fits on (single-node)
B200 SXM INT4B100 SXM INT4GB200 NVL72 (per GPU) INT4GB300 NVL72 (per GPU) INT4H200 SXM INT4H100 NVL 94GB (per GPU pair) INT4Instinct MI300X INT4Instinct MI325X INT4
GPU Recommendations
B200 SXMoptimal
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$8522
Cost/M Tokens
$11.58
B100 SXMoptimal
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$8541
Cost/M Tokens
$11.61
GB200 NVL72 (per GPU)optimal
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$12337
Cost/M Tokens
$16.77
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| together | $1.50 | $3.00 | Cheapest |
| fireworks | $1.80 | $3.50 |
Quality Benchmarks
MMLU88.0
HumanEval62.0
GSM8K94.0
MT-Bench88.0
Capabilities
Features
✓ Tool Use✗ Vision✓ Code✓ Math✓ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
vllmsglangtensorrt-llm
Supported Precisions
BF16 (default)FP8INT4