Qwen 3 30B-A3B
Alibaba · moe · 30.5B parameters · 131,072 context
Parameters
30.5B
Context Window
128K tokens
Architecture
MoE
Best GPU
H200 SXM
Cheapest API
$0.45/M
Quality Score
70/100
Intelligence Brief
Qwen 3 30B-A3B is a 30.5B parameter Mixture-of-Experts (128 experts, 8 active) model from Alibaba, featuring Grouped Query Attention (GQA) with 48 layers and 2,048 hidden dimensions. With a 131,072 token context window, it supports tools, structured output, code, math, multilingual, reasoning. On standardized benchmarks, it achieves MMLU 75, HumanEval 48, GSM8K 80. The most cost-effective API deployment is via novita at $0.45/M output tokens. For self-hosted inference, H200 SXM delivers optimal throughput at $2553/month.
Provider pricing
2 providers · canonical: novita| Provider | Input $/M | Output $/M ▲ | Notes |
|---|---|---|---|
| novitacanonical | $0.090 | $0.450 | cheapest input · cheapest output |
| openrouter | $0.090 | $0.450 | cheapest input · cheapest output |
Prices update via the nightly pricing cron + admin approvals at /admin/ingest-queue. The leaderboard's Input/Output cells show the canonical rate above; this table shows the full spread.
Recent changes
Loading…
Related models
5 suggestions
Qwen 3 0.6BQwen 3 · 0.6B—
Qwen 3 1.7BQwen 3 · 1.7B—
Qwen 3 235BQwen 3 · 22Bfree/M out
Qwen 3 32BQwen 3 · 32.8Bfree/M out
Qwen 3 4BQwen 3 · 4B$0.100/M out
Picks: same family first, then same vendor within ±2× params, then top tag-overlap matches. Price shown is the cheapest Output $/M across providers — the row's page shows the canonical anchor.
Architecture Details
Memory Requirements
BF16 Weights
61.0 GB
FP8 Weights
30.5 GB
INT4 Weights
15.3 GB
GPU Compatibility Matrix
Qwen 3 30B-A3B is compatible with 62% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$2553
Cost/M Tokens
$0.93
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$1794
Cost/M Tokens
$0.65
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$1794
Cost/M Tokens
$0.65
Deployment Options
API Deployment
novita
$0.45/M
output tokens
Single GPU
H200 SXM
$2553/mo
Min VRAM: 31 GB
Multi-GPU
RTX A6000 x2
878.7 tok/s
TP· $930/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| novita | $0.09 | $0.45 | Cheapest |
| openrouter | $0.09 | $0.45 |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| novitaBest Value | $0.09 | $0.45 | $3 |
| openrouter | $0.09 | $0.45 | $3 |
Cost per 1,000 Requests
Short (500 tok)
$0.14
via novita
Medium (2K tok)
$0.54
via novita
Long (8K tok)
$1.62
via novita
Performance Estimates
Throughput by GPU
VRAM Breakdown (H200 SXM, FP8)
Precision Impact
bf16
61.0 GB
weights/GPU
fp8
30.5 GB
weights/GPU
~1.1K tok/s
int4
15.3 GB
weights/GPU
Quality Benchmarks
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Qwen 3 30B-A3B
Self-Hosted Infrastructure
Similar Models
Qwen 3 32B
32.8B params · dense
Quality: 74
from $0.00/M
Claude Haiku 4.5
30B params · moe
Quality: 50
from $5.00/M
JAIS 30B
30B params · dense
Quality: 50
Gemma 4 31B-IT
31B params · dense
Quality: 77
from $0.00/M
MPT 30B
30B params · dense
Quality: 48
Frequently Asked Questions
How much VRAM does Qwen 3 30B-A3B need for inference?
Qwen 3 30B-A3B requires approximately 61.0 GB of VRAM at BF16 precision, 30.5 GB at FP8, or 15.3 GB at INT4 quantization. Additional VRAM is needed for KV-cache (24576 bytes per token) and activations (~0.50 GB).
What is the best GPU for Qwen 3 30B-A3B?
The top recommended GPU for Qwen 3 30B-A3B is the H200 SXM using FP8 precision. It achieves approximately 1.1K tokens/sec at an estimated cost of $2553/month ($0.93/M tokens). Score: 100/100.
How much does Qwen 3 30B-A3B inference cost?
Qwen 3 30B-A3B API inference starts from $0.09/M input tokens and $0.45/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.