Skip to content
Updated minutes ago
NVIDIA

Nemotron 340B

NVIDIA · dense · 340B parameters · 131,072 context

Quality
85.0

Architecture Details

TypeDENSE
Total Parameters340B
Active Parameters340B
Layers96
Hidden Dimension18,432
Attention Heads96
KV Heads8
Head Dimension192
Vocab Size256,000

Memory Requirements

BF16 Weights

680.0 GB

FP8 Weights

340.0 GB

INT4 Weights

170.0 GB

KV-Cache per Token2359296 bytes
Activation Estimate8.00 GB

Fits on (single-node)

Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4Groq LPU INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4

GPU Recommendations

B200 NVL (pair)optimal

FP8 · 2 GPUs · tensorrt-llm

88/100

score

Throughput

280.0 tok/s

Cost/Month

$19929

Cost/M Tokens

$27.08

Use this config →
B200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

83/100

score

Throughput

280.0 tok/s

Cost/Month

$17044

Cost/M Tokens

$23.16

Use this config →
H200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

80/100

score

Throughput

280.0 tok/s

Cost/Month

$10211

Cost/M Tokens

$13.88

Use this config →

API Pricing Comparison

ProviderInput $/MOutput $/MBadges
nvidia$4.20$4.20
Cheapest

Quality Benchmarks

MMLU
82.0
HumanEval
57.0
GSM8K
92.0
MT-Bench
85.0

Capabilities

Features

Tool Use Vision Code Math Reasoning Multilingual Structured Output

Supported Frameworks

tensorrt-llmvllmsglang

Supported Precisions

BF16 (default)FP8INT4

Similar Models