Skip to content
Updated minutes ago
Inflection

Inflection 3

Inflection AI · dense · 100B parameters · 8,192 context

Quality
74.0

Architecture Details

TypeDENSE
Total Parameters100B
Active Parameters100B
Layers72
Hidden Dimension10,240
Attention Heads80
KV Heads10
Head Dimension128
Vocab Size128,000

Memory Requirements

BF16 Weights

200.0 GB

FP8 Weights

100.0 GB

INT4 Weights

50.0 GB

KV-Cache per Token184320 bytes
Activation Estimate3.50 GB

Fits on (single-node)

B200 SXM FP8B100 SXM FP8GB200 NVL72 (per GPU) FP8GB300 NVL72 (per GPU) FP8H200 SXM FP8H100 SXM INT4H100 PCIe INT4H100 NVL INT4

GPU Recommendations

B200 SXMoptimal

BF16 · 2 GPUs · tensorrt-llm

93/100

score

Throughput

280.0 tok/s

Cost/Month

$8522

Cost/M Tokens

$11.58

Use this config →
B100 SXMoptimal

BF16 · 2 GPUs · tensorrt-llm

93/100

score

Throughput

280.0 tok/s

Cost/Month

$8541

Cost/M Tokens

$11.61

Use this config →
H200 SXMoptimal

BF16 · 2 GPUs · tensorrt-llm

90/100

score

Throughput

280.0 tok/s

Cost/Month

$5106

Cost/M Tokens

$6.94

Use this config →

API Pricing Comparison

ProviderInput $/MOutput $/MBadges
inflection$5.00$15.00
Cheapest

Quality Benchmarks

MMLU
78.0
HumanEval
48.0
GSM8K
80.0
MT-Bench
80.0

Capabilities

Features

Tool Use Vision Code Math Reasoning Multilingual Structured Output

Supported Frameworks

Supported Precisions

BF16 (default)

Similar Models