How many parameters does Qwen 3 30B-A3B have?

Qwen 3 30B-A3B has 30.5 billion total parameters with 3.3 billion active parameters. It uses a MOE architecture with 48 layers and a hidden dimension of 2,048.

What is the context length of Qwen 3 30B-A3B?

Qwen 3 30B-A3B supports a context window of 131,072 tokens.

Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

Qwen 3 30B-A3B

Alibaba · moe · 30.5B parameters · 131,072 context

Quality

70.0

Calculate ROI →Compare with others Fine-Tune This Model →

Parameters

30.5B

Context Window

128K tokens

Architecture

MoE

Best GPU

H200 SXM

Quality Score

70/100

Intelligence Brief

Qwen 3 30B-A3B is a 30.5B parameter Mixture-of-Experts (128 experts, 8 active) model from Alibaba, featuring Grouped Query Attention (GQA) with 48 layers and 2,048 hidden dimensions. With a 131,072 token context window, it supports tools, structured output, code, math, multilingual, reasoning. On standardized benchmarks, it achieves MMLU 75, HumanEval 48, GSM8K 80. For self-hosted inference, H200 SXM delivers optimal throughput at $2553/month.

Architecture Details

TypeMOE

Total Parameters30.5B

Active Parameters3.3B

Layers48

Hidden Dimension2,048

Attention Heads32

KV Heads4

Head Dimension128

Vocab Size151,936

Total Experts128

Active Experts8

Memory Requirements

BF16 Weights

61.0 GB

FP8 Weights

30.5 GB

INT4 Weights

15.3 GB

KV-Cache per Token24576 bytes

Activation Estimate0.50 GB

Fits on (single GPU) — most practical first

BEST FITA10G24GBINT4 A3024GBINT4 L424GBINT4 RTX 409024GBINT4 RTX 309024GBINT4 RTX A500024GBINT4 RX 7900 XTX24GBINT4 RTX 509032GBINT4 V100 32GB32GBINT4 Instinct MI10032GBINT4 TPU v432GBINT4 TPU v6e (Trillium)32GBINT4

GPU Compatibility Matrix

Qwen 3 30B-A3B is compatible with 62% of GPU configurations across 41 GPUs at 3 precision levels.

No fit

Tight

Good

Excellent

BF16 (Full)

FP8 (Half)

INT4 (Quarter)

Blackwell(7 GPUs)

B200 NVL (pair)360GB

B300288GB

B100 SXM192GB

GB200 NVL72 (per GPU)192GB

68%

84%

92%

Hopper(7 GPUs)

H100 NVL 94GB (per GPU pair)188GB

H200 SXM141GB

H2096GB

GH20096GB

Ada Lovelace(11 GPUs)

L40S48GB

L4048GB

RTX 6000 Ada48GB

L2048GB

Ampere(16 GPUs)

A100 80GB SXM80GB

A100 80GB PCIe80GB

A1664GB

RTX A600048GB

Legend:No fitVery tightTightModerateGoodExcellent

GPU Recommendations

H200 SXMoptimal

FP8 · 1 GPU · tensorrt-llm

100/100

score

Throughput

1.1K tok/s

Latency (ITL)

1.0ms

Est. TTFT

0ms

Cost/Month

$2553

Cost/M Tokens

$0.93

Use this config →

H100 SXMoptimal

FP8 · 1 GPU · tensorrt-llm

100/100

score

Throughput

1.1K tok/s

Latency (ITL)

1.0ms

Est. TTFT

0ms

Cost/Month

$1794

Cost/M Tokens

$0.65

Use this config →

H100 PCIeoptimal

FP8 · 1 GPU · tensorrt-llm

100/100

score

Throughput

1.1K tok/s

Latency (ITL)

1.0ms

Est. TTFT

0ms

Cost/Month

$1794

Cost/M Tokens

$0.65

Use this config →

Deployment Options

API

API Deployment

No API pricing available

Self-Hosted

Single GPU

H200 SXM

$2553/mo

Min VRAM: 31 GB

Scale

Multi-GPU

RTX A6000 x2

878.7 tok/s

TP· $930/mo

Open Full Calculator →

API Pricing Comparison

No API pricing data available for this model.

Performance Estimates

Throughput by GPU

H200 SXM

1.1K tok/s

H100 SXM

1.1K tok/s

H100 PCIe

1.1K tok/s

VRAM Breakdown (H200 SXM, FP8)

Weights

Weights 30.5 GBKV-Cache 0.8 GBActivations 4.0 GBOverhead 1.5 GB

Precision Impact

bf16

61.0 GB

weights/GPU

fp8

30.5 GB

weights/GPU

~1.1K tok/s

int4

15.3 GB

weights/GPU

Quality Benchmarks

Average

73th percentile across all models

MMLU

75.0

Below Average (46th pctile)

HumanEval

48.0

Below Average (43th pctile)

GSM8K

80.0

Below Average (47th pctile)

MT-Bench

78.0

Bottom 25% (0th pctile)

Capabilities

Features

✓ Tool Use✗ Vision✓ Code✓ Math✓ Reasoning✓ Multilingual✓ Structured Output

Supported Frameworks

vllmsglangtgiollama

Supported Precisions

BF16 (default)FP8INT4

Where to Deploy Qwen 3 30B-A3B

Self-Hosted Infrastructure

H200 SXM GPU Details

Pricing across 4 providers

→

Configure Deployment

Full calculator with H200 SXM

→

Share This Benchmark

Qwen 3 30B-A3B

Alibaba

Quality

70/100

Throughput

1.1K tok/s

Best GPU

H200 SXM

Share on X Share on LinkedIn

Embed Badge

<a href="https://inferencebench.io/models/qwen/qwen-3-30b-a3b/"><img src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%22312%22%20height%3D%2220%22%20role%3D%22img%22%20aria-label%3D%22InferenceBench%3A%20Qwen%203%2030B-A3B%20%7C%2030.5B%20params%22%3E%0A%20%20%3Ctitle%3EInferenceBench%3A%20Qwen%203%2030B-A3B%20%7C%2030.5B%20params%3C%2Ftitle%3E%0A%20%20%3ClinearGradient%20id%3D%22s%22%20x2%3D%220%22%20y2%3D%22100%25%22%3E%0A%20%20%20%20%3Cstop%20offset%3D%220%22%20stop-color%3D%22%23bbb%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%20%20%3Cstop%20offset%3D%221%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%3C%2FlinearGradient%3E%0A%20%20%3CclipPath%20id%3D%22r%22%3E%0A%20%20%20%20%3Crect%20width%3D%22312%22%20height%3D%2220%22%20rx%3D%223%22%20fill%3D%22%23fff%22%2F%3E%0A%20%20%3C%2FclipPath%3E%0A%20%20%3Cg%20clip-path%3D%22url(%23r)%22%3E%0A%20%20%20%20%3Crect%20width%3D%22107%22%20height%3D%2220%22%20fill%3D%22%23333%22%2F%3E%0A%20%20%20%20%3Crect%20x%3D%22107%22%20width%3D%22205%22%20height%3D%2220%22%20fill%3D%22%23007ec6%22%2F%3E%0A%20%20%20%20%3Crect%20width%3D%22312%22%20height%3D%2220%22%20fill%3D%22url(%23s)%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%20%20%3Cg%20fill%3D%22%23fff%22%20text-anchor%3D%22middle%22%20font-family%3D%22Verdana%2CGeneva%2CDejaVu%20Sans%2Csans-serif%22%20text-rendering%3D%22geometricPrecision%22%20font-size%3D%2211%22%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%2253.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EInferenceBench%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%2253.5%22%20y%3D%2213%22%3EInferenceBench%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%22209.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EQwen%203%2030B-A3B%20%7C%2030.5B%20params%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%22209.5%22%20y%3D%2213%22%3EQwen%203%2030B-A3B%20%7C%2030.5B%20params%3C%2Ftext%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E" alt="InferenceBench — Qwen 3 30B-A3B" /></a>

Similar Models

Qwen 3 32B

32.8B params · dense

Quality: 74

from $0.80/M

Similar specsCompare →

JAIS 30B

30B params · dense

Quality: 50

Smaller context, Lower qualityCompare →

MPT 30B

30B params · dense

Quality: 48

Smaller context, Lower qualityCompare →

Gemma 4 31B-IT

31B params · dense

Quality: 77

from $0.30/M

Smaller context, Higher qualityCompare →

Qwen 2.5 32B

32.5B params · dense

Quality: 73

from $0.80/M

Similar specsCompare →

Frequently Asked Questions

How much VRAM does Qwen 3 30B-A3B need for inference?

Qwen 3 30B-A3B requires approximately 61.0 GB of VRAM at BF16 precision, 30.5 GB at FP8, or 15.3 GB at INT4 quantization. Additional VRAM is needed for KV-cache (24576 bytes per token) and activations (~0.50 GB).

What is the best GPU for Qwen 3 30B-A3B?

The top recommended GPU for Qwen 3 30B-A3B is the H200 SXM using FP8 precision. It achieves approximately 1.1K tokens/sec at an estimated cost of $2553/month ($0.93/M tokens). Score: 100/100.

How much does Qwen 3 30B-A3B inference cost?

Qwen 3 30B-A3B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.