Claude Haiku 4.5
Anthropic · moe · 30B parameters · 200,000 context
Parameters
30B
Context Window
195K tokens
Architecture
MoE
Best GPU
H100 NVL
Cheapest API
$5.00/M
Intelligence Brief
Claude Haiku 4.5 is a 30B parameter Mixture-of-Experts (16 experts, 2 active) model from Anthropic, featuring Grouped Query Attention (GQA) with 40 layers and 8,192 hidden dimensions. With a 200,000 token context window, it supports tools, vision, structured output, code, math, multilingual, reasoning. The most cost-effective API deployment is via anthropic at $5.00/M output tokens. For self-hosted inference, H100 NVL delivers optimal throughput at $2932/month.
Provider pricing
1 provider · canonical: anthropic| Provider | Input $/M | Output $/M ▲ | Notes |
|---|---|---|---|
| anthropiccanonical | $1.00 | $5.00 | cheapest input · cheapest output |
Prices update via the nightly pricing cron + admin approvals at /admin/ingest-queue. The leaderboard's Input/Output cells show the canonical rate above; this table shows the full spread.
Recent changes
Loading…
Related models
5 suggestions
Claude 3 OpusClaude · 175B$75.00/M out
Claude 3 SonnetClaude · 70B$15.00/M out
Claude 3.5 HaikuClaude · 20B$4.00/M out
Claude Opus 4.1Claude · 75B$75.00/M out
Claude Opus 4.5Claude · 80B$25.00/M out
Picks: same family first, then same vendor within ±2× params, then top tag-overlap matches. Price shown is the cheapest Output $/M across providers — the row's page shows the canonical anchor.
Architecture Details
Memory Requirements
BF16 Weights
60.0 GB
FP8 Weights
30.0 GB
INT4 Weights
15.0 GB
GPU Compatibility Matrix
Claude Haiku 4.5 is compatible with 62% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · tensorrt-llm
100/100
score
Throughput
566.1 tok/s
Latency (ITL)
1.8ms
Est. TTFT
0ms
Cost/Month
$2932
Cost/M Tokens
$1.97
BF16 · 1 GPU · tensorrt-llm
100/100
score
Throughput
575.0 tok/s
Latency (ITL)
1.7ms
Est. TTFT
0ms
Cost/Month
$940
Cost/M Tokens
$0.62
BF16 · 1 GPU · tensorrt-llm
100/100
score
Throughput
575.0 tok/s
Latency (ITL)
1.7ms
Est. TTFT
0ms
Cost/Month
$2838
Cost/M Tokens
$1.88
Deployment Options
API Deployment
anthropic
$5.00/M
output tokens
Single GPU
H100 NVL
$2932/mo
Min VRAM: 30 GB
Multi-GPU
A10G x4
177.9 tok/s
TP· $1139/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| anthropic | $1.00 | $5.00 | Cheapest |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| anthropicBest Value | $1.00 | $5.00 | $30 |
Cost per 1,000 Requests
Short (500 tok)
$1.50
via anthropic
Medium (2K tok)
$6.00
via anthropic
Long (8K tok)
$18.00
via anthropic
Performance Estimates
Throughput by GPU
VRAM Breakdown (H100 NVL, BF16)
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Claude Haiku 4.5
Self-Hosted Infrastructure
Similar Models
Claude 3.5 Haiku
20B params · dense
Quality: 67
from $4.00/M
Claude 3 Sonnet
70B params · dense
Quality: 78
from $15.00/M
Claude Sonnet 4
70B params · dense
Quality: 86
from $15.00/M
JAIS 30B
30B params · dense
Quality: 50
MPT 30B
30B params · dense
Quality: 48
Frequently Asked Questions
How much VRAM does Claude Haiku 4.5 need for inference?
Claude Haiku 4.5 requires approximately 60.0 GB of VRAM at BF16 precision, 30.0 GB at FP8, or 15.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (204800 bytes per token) and activations (~4.00 GB).
What is the best GPU for Claude Haiku 4.5?
The top recommended GPU for Claude Haiku 4.5 is the H100 NVL using BF16 precision. It achieves approximately 566.1 tokens/sec at an estimated cost of $2932/month ($1.97/M tokens). Score: 100/100.
How much does Claude Haiku 4.5 inference cost?
Claude Haiku 4.5 API inference starts from $1.00/M input tokens and $5.00/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.