Skip to content
Updated minutes ago
NVIDIA

Megatron-Turing NLG 530B

NVIDIA · dense · 530B parameters · 2,048 context

Quality
58.0

Architecture Details

TypeDENSE
Total Parameters530B
Active Parameters530B
Layers105
Hidden Dimension20,480
Attention Heads128
KV Heads128
Head Dimension160
Vocab Size50,257

Memory Requirements

BF16 Weights

1060.0 GB

FP8 Weights

530.0 GB

INT4 Weights

265.0 GB

KV-Cache per Token3440640 bytes
Activation Estimate12.00 GB

Fits on (single-node)

B200 NVL (pair) INT4B200 SXMx2 INT4B100 SXMx2 INT4GB200 NVL72 (per GPU)x2 INT4GB300 NVL72 (per GPU)x2 INT4H100 NVL 94GB (per GPU pair)x2 INT4Instinct MI300Xx2 INT4Instinct MI325Xx2 INT4

GPU Recommendations

B200 NVL (pair)optimal

FP8 · 2 GPUs · tensorrt-llm

88/100

score

Throughput

140.0 tok/s

Cost/Month

$19929

Cost/M Tokens

$54.17

Use this config →
B200 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

83/100

score

Throughput

140.0 tok/s

Cost/Month

$17044

Cost/M Tokens

$46.33

Use this config →
B100 SXMoptimal

FP8 · 4 GPUs · tensorrt-llm

83/100

score

Throughput

140.0 tok/s

Cost/Month

$17082

Cost/M Tokens

$46.43

Use this config →

API Pricing Comparison

No API pricing data available for this model.

Quality Benchmarks

MMLU
63.0
HumanEval
30.0
GSM8K
50.0
MT-Bench
70.0

Capabilities

Features

Tool Use Vision Code Math Reasoning Multilingual Structured Output

Supported Frameworks

tensorrt-llmvllm

Supported Precisions

BF16 (default)FP8INT4

Similar Models