Updated minutes ago
Gemini 1.5 Pro
Google · moe · 175B parameters · 2,097,152 context
Quality86.0
Architecture Details
TypeMOE
Total Parameters175B
Active Parameters40B
Layers72
Hidden Dimension8,192
Attention Heads64
KV Heads8
Head Dimension128
Vocab Size256,000
Total Experts16
Active Experts2
Memory Requirements
BF16 Weights
350.0 GB
FP8 Weights
175.0 GB
INT4 Weights
87.5 GB
KV-Cache per Token147456 bytes
Activation Estimate4.00 GB
Fits on (single-node)
B200 SXM INT4B100 SXM INT4GB200 NVL72 (per GPU) INT4GB300 NVL72 (per GPU) INT4H200 SXM INT4H100 NVL 94GB (per GPU pair) INT4Instinct MI300X INT4Instinct MI325X FP8
GPU Recommendations
B200 NVL (pair)optimal
BF16 · 2 GPUs · tensorrt-llm
98/100
score
Throughput
280.0 tok/s
Cost/Month
$19929
Cost/M Tokens
$27.08
B200 SXMoptimal
BF16 · 4 GPUs · tensorrt-llm
93/100
score
Throughput
280.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$23.16
H200 SXMoptimal
BF16 · 4 GPUs · tensorrt-llm
90/100
score
Throughput
280.0 tok/s
Cost/Month
$10211
Cost/M Tokens
$13.88
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| $1.25 | $5.00 | Cheapest |
Quality Benchmarks
MMLU86.5
HumanEval65.0
GSM8K92.0
MT-Bench87.0
Capabilities
Features
✓ Tool Use✓ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
Supported Precisions
BF16 (default)