Updated minutes ago
Nemotron 340B
NVIDIA · dense · 340B parameters · 131,072 context
Quality85.0
Architecture Details
TypeDENSE
Total Parameters340B
Active Parameters340B
Layers96
Hidden Dimension18,432
Attention Heads96
KV Heads8
Head Dimension192
Vocab Size256,000
Memory Requirements
BF16 Weights
680.0 GB
FP8 Weights
340.0 GB
INT4 Weights
170.0 GB
KV-Cache per Token2359296 bytes
Activation Estimate8.00 GB
Fits on (single-node)
Instinct MI325X INT4B200 NVL (pair) INT4B300 INT4Groq LPU INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4
GPU Recommendations
B200 NVL (pair)optimal
FP8 · 2 GPUs · tensorrt-llm
88/100
score
Throughput
280.0 tok/s
Cost/Month
$19929
Cost/M Tokens
$27.08
B200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
83/100
score
Throughput
280.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$23.16
H200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
80/100
score
Throughput
280.0 tok/s
Cost/Month
$10211
Cost/M Tokens
$13.88
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| nvidia | $4.20 | $4.20 | Cheapest |
Quality Benchmarks
MMLU82.0
HumanEval57.0
GSM8K92.0
MT-Bench85.0
Capabilities
Features
✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
tensorrt-llmvllmsglang
Supported Precisions
BF16 (default)FP8INT4