DeepInfra offers 21 model endpoints with output pricing starting at $0.02/million tokens. Compared to the market average of $1.03/million output tokens across inference API providers, DeepInfra's entry-level pricing is 98% below average.
Provider Overview
Type
inference
Billing
Per token
Egress
Free
SLA Uptime
99.9%
Autoscaling
Yes
Cold Start
None
Model Pricing (21)
| Model | Input $/M | Output $/M | Latency | Throughput | Context |
|---|---|---|---|---|---|
| llama-3.2-1bCheapest | $0.02 | $0.02 | 0.08s | 350 t/s | 128k |
| llama-3.2-3b | $0.04 | $0.04 | 0.1s | 280 t/s | 128k |
| phi-3-mini-128k | $0.05 | $0.05 | 0.12s | 230 t/s | 128k |
| llama-3.1-8b | $0.06 | $0.06 | 0.15s | 200 t/s | 128k |
| gemma-2-9b | $0.06 | $0.06 | 0.12s | 200 t/s | 8k |
| qwen-2.5-7b | $0.07 | $0.07 | 0.15s | 180 t/s | 32k |
| llama-3.2-11b-vision | $0.12 | $0.12 | 0.2s | 150 t/s | 128k |
| phi-4-14b | $0.12 | $0.12 | 0.15s | 160 t/s | 16k |
| phi-3-medium-128k | $0.14 | $0.14 | 0.2s | 130 t/s | 128k |
| qwen-2.5-32b | $0.18 | $0.20 | 0.25s | 100 t/s | 32k |
| qwen-2.5-coder-32b | $0.18 | $0.20 | 0.25s | 95 t/s | 32k |
| mixtral-8x7b | $0.24 | $0.24 | 0.2s | 120 t/s | 33k |
| deepseek-v3 | $0.30 | $0.30 | 0.3s | 80 t/s | 64k |
| gemma-2-27b | $0.30 | $0.30 | 0.25s | 90 t/s | 8k |
| llama-3.1-70b | $0.35 | $0.40 | 0.3s | 85 t/s | 128k |
| llama-3.3-70b | $0.35 | $0.40 | 0.28s | 90 t/s | 128k |
| qwen-2.5-72b | $0.35 | $0.40 | 0.35s | 75 t/s | 32k |
| mixtral-8x22b | $0.65 | $0.65 | 0.4s | 65 t/s | 66k |
| llama-3.2-90b-vision | $0.65 | $0.65 | 0.5s | 50 t/s | 128k |
| llama-3.1-405b | $1.80 | $1.80 | 0.7s | 35 t/s | 128k |
| deepseek-r1 | $1.50 | $4.00 | 2s | 30 t/s | 64k |
Reputation Details
Pricing
90
Reliability
90
Features
75
Highlights
- Very competitive token pricing
- 99.9%+ SLA
- Autoscaling supported
- Fast cold start
Compare with Others
| Provider | Overall | Pricing | Reliability | Features | Models |
|---|---|---|---|---|---|
| DeepInfra | 86 | 90 | 90 | 75 | 21 |
| Together AI | 78 | 70 | 90 | 75 | 20 |
| Fireworks AI | 78 | 70 | 90 | 75 | 14 |
| Groq | 86 | 90 | 90 | 75 | 10 |
| DeepSeek | 72 | 70 | 70 | 75 | 3 |
Embed Badge
<a href="https://inferencebench.io/providers/deepinfra/"><img src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%22241%22%20height%3D%2220%22%20role%3D%22img%22%20aria-label%3D%22InferenceBench%20Verified%3A%20DeepInfra%22%3E%0A%20%20%3Ctitle%3EInferenceBench%20Verified%3A%20DeepInfra%3C%2Ftitle%3E%0A%20%20%3ClinearGradient%20id%3D%22s%22%20x2%3D%220%22%20y2%3D%22100%25%22%3E%0A%20%20%20%20%3Cstop%20offset%3D%220%22%20stop-color%3D%22%23bbb%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%20%20%3Cstop%20offset%3D%221%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%3C%2FlinearGradient%3E%0A%20%20%3CclipPath%20id%3D%22r%22%3E%0A%20%20%20%20%3Crect%20width%3D%22241%22%20height%3D%2220%22%20rx%3D%223%22%20fill%3D%22%23fff%22%2F%3E%0A%20%20%3C%2FclipPath%3E%0A%20%20%3Cg%20clip-path%3D%22url(%23r)%22%3E%0A%20%20%20%20%3Crect%20width%3D%22166%22%20height%3D%2220%22%20fill%3D%22%23333%22%2F%3E%0A%20%20%20%20%3Crect%20x%3D%22166%22%20width%3D%2275%22%20height%3D%2220%22%20fill%3D%22%238b5cf6%22%2F%3E%0A%20%20%20%20%3Crect%20width%3D%22241%22%20height%3D%2220%22%20fill%3D%22url(%23s)%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%20%20%3Cg%20fill%3D%22%23fff%22%20text-anchor%3D%22middle%22%20font-family%3D%22Verdana%2CGeneva%2CDejaVu%20Sans%2Csans-serif%22%20text-rendering%3D%22geometricPrecision%22%20font-size%3D%2211%22%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%2283%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%2283%22%20y%3D%2213%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%22203.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EDeepInfra%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%22203.5%22%20y%3D%2213%22%3EDeepInfra%3C%2Ftext%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E" alt="InferenceBench Verified — DeepInfra" /></a>