Updated minutes ago
Megatron-Turing NLG 530B
NVIDIA · dense · 530B parameters · 2,048 context
Quality58.0
Architecture Details
TypeDENSE
Total Parameters530B
Active Parameters530B
Layers105
Hidden Dimension20,480
Attention Heads128
KV Heads128
Head Dimension160
Vocab Size50,257
Memory Requirements
BF16 Weights
1060.0 GB
FP8 Weights
530.0 GB
INT4 Weights
265.0 GB
KV-Cache per Token3440640 bytes
Activation Estimate12.00 GB
Fits on (single-node)
B200 NVL (pair) INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H100 NVL 94GB (per GPU pair)x2 INT4Instinct MI300Xx2 INT4Instinct MI325Xx2 INT4
GPU Recommendations
B200 NVL (pair)optimal
FP8 · 2 GPUs · tensorrt-llm
88/100
score
Throughput
140.0 tok/s
Cost/Month
$19929
Cost/M Tokens
$54.17
B200 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
83/100
score
Throughput
140.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$46.33
B100 SXMoptimal
FP8 · 4 GPUs · tensorrt-llm
83/100
score
Throughput
140.0 tok/s
Cost/Month
$17082
Cost/M Tokens
$46.43
API Pricing Comparison
No API pricing data available for this model.
Quality Benchmarks
MMLU63.0
HumanEval30.0
GSM8K50.0
MT-Bench70.0
Capabilities
Features
✗ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✗ Structured Output
Supported Frameworks
tensorrt-llmvllm
Supported Precisions
BF16 (default)FP8INT4