Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

Nemotron 340B

NVIDIA · dense · 340B parameters · 131,072 context

Quality

85.0

Calculate ROI →Compare with others Fine-Tune This Model →

Architecture Details

TypeDENSE

Total Parameters340B

Active Parameters340B

Layers96

Hidden Dimension18,432

Attention Heads96

KV Heads8

Head Dimension192

Vocab Size256,000

Memory Requirements

BF16 Weights

680.0 GB

FP8 Weights

340.0 GB

INT4 Weights

170.0 GB

KV-Cache per Token2359296 bytes

Activation Estimate8.00 GB

Fits on (single-node)

Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4Groq LPU INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4

GPU Recommendations

B200 NVL (pair)optimal

FP8 · 2 GPUs · tensorrt-llm

88/100

score

Throughput

280.0 tok/s

Cost/Month

$19929

Cost/M Tokens

$27.08

Use this config →

B200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

83/100

score

Throughput

280.0 tok/s

Cost/Month

$17044

Cost/M Tokens

$23.16

Use this config →

H200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

80/100

score

Throughput

280.0 tok/s

Cost/Month

$10211

Cost/M Tokens

$13.88

Use this config →

API Pricing Comparison

Provider	Input $/M	Output $/M	Badges
nvidia	$4.20	$4.20	Cheapest

Quality Benchmarks

MMLU

82.0

HumanEval

57.0

GSM8K

92.0

MT-Bench

85.0

Capabilities

Features

✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output

Supported Frameworks

tensorrt-llmvllmsglang

Supported Precisions

BF16 (default)FP8INT4

Similar Models

Grok-2

314B params · moe

Quality: 87

from $10.00/M

Snowflake Arctic 128x3B

395B params · moe

Quality: 50

Jamba 1.5 Large

398B params · hybrid

Quality: 50

from $8.00/M

Llama 4 Maverick

400B params · moe

Quality: 89

from $1.80/M

Llama 3.1 405B

405B params · dense

Quality: 88

from $3.00/M