Updated minutes ago
Inflection 3
Inflection AI · dense · 100B parameters · 8,192 context
Quality74.0
Architecture Details
TypeDENSE
Total Parameters100B
Active Parameters100B
Layers72
Hidden Dimension10,240
Attention Heads80
KV Heads10
Head Dimension128
Vocab Size128,000
Memory Requirements
BF16 Weights
200.0 GB
FP8 Weights
100.0 GB
INT4 Weights
50.0 GB
KV-Cache per Token184320 bytes
Activation Estimate3.50 GB
Fits on (single-node)
B200 SXM FP8B100 SXM FP8GB200 NVL72 (per GPU) FP8GB300 NVL72 (per GPU) FP8H200 SXM FP8H100 SXM INT4H100 PCIe INT4H100 NVL INT4
GPU Recommendations
B200 SXMoptimal
BF16 · 2 GPUs · tensorrt-llm
93/100
score
Throughput
280.0 tok/s
Cost/Month
$8522
Cost/M Tokens
$11.58
B100 SXMoptimal
BF16 · 2 GPUs · tensorrt-llm
93/100
score
Throughput
280.0 tok/s
Cost/Month
$8541
Cost/M Tokens
$11.61
H200 SXMoptimal
BF16 · 2 GPUs · tensorrt-llm
90/100
score
Throughput
280.0 tok/s
Cost/Month
$5106
Cost/M Tokens
$6.94
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| inflection | $5.00 | $15.00 | Cheapest |
Quality Benchmarks
MMLU78.0
HumanEval48.0
GSM8K80.0
MT-Bench80.0
Capabilities
Features
✗ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output
Supported Frameworks
Supported Precisions
BF16 (default)