What GPUs does DeepInfra offer?

DeepInfra is an inference api provider offering 21 AI model endpoints.

What is DeepInfra pricing?

DeepInfra model pricing starts from $0.02/M input tokens and $0.02/M output tokens.

Does DeepInfra offer autoscaling?

Yes, DeepInfra supports autoscaling for dynamic workload management. Cold start time is negligible.

DeepInfra

Inference API Provider

Reputation:

86/100

deepinfra.com

DeepInfra offers 21 model endpoints with output pricing starting at $0.02/million tokens. Compared to the market average of $1.03/million output tokens across inference API providers, DeepInfra's entry-level pricing is 98% below average.

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.9%

Autoscaling

Yes

Cold Start

None

Model Pricing (21)

Model	Input $/M	Output $/M	Latency	Throughput	Context
llama-3.2-1bCheapest	$0.02	$0.02	0.08s	350 t/s	128k
llama-3.2-3b	$0.04	$0.04	0.1s	280 t/s	128k
phi-3-mini-128k	$0.05	$0.05	0.12s	230 t/s	128k
llama-3.1-8b	$0.06	$0.06	0.15s	200 t/s	128k
gemma-2-9b	$0.06	$0.06	0.12s	200 t/s	8k
qwen-2.5-7b	$0.07	$0.07	0.15s	180 t/s	32k
llama-3.2-11b-vision	$0.12	$0.12	0.2s	150 t/s	128k
phi-4-14b	$0.12	$0.12	0.15s	160 t/s	16k
phi-3-medium-128k	$0.14	$0.14	0.2s	130 t/s	128k
qwen-2.5-32b	$0.18	$0.20	0.25s	100 t/s	32k
qwen-2.5-coder-32b	$0.18	$0.20	0.25s	95 t/s	32k
mixtral-8x7b	$0.24	$0.24	0.2s	120 t/s	33k
deepseek-v3	$0.30	$0.30	0.3s	80 t/s	64k
gemma-2-27b	$0.30	$0.30	0.25s	90 t/s	8k
llama-3.1-70b	$0.35	$0.40	0.3s	85 t/s	128k
llama-3.3-70b	$0.35	$0.40	0.28s	90 t/s	128k
qwen-2.5-72b	$0.35	$0.40	0.35s	75 t/s	32k
mixtral-8x22b	$0.65	$0.65	0.4s	65 t/s	66k
llama-3.2-90b-vision	$0.65	$0.65	0.5s	50 t/s	128k
llama-3.1-405b	$1.80	$1.80	0.7s	35 t/s	128k
deepseek-r1	$1.50	$4.00	2s	30 t/s	64k

Reputation Details

Pricing

Reliability

Features

Highlights

Very competitive token pricing
99.9%+ SLA
Autoscaling supported
Fast cold start

Compare with Others

Provider	Overall	Pricing	Reliability	Features	Models
DeepInfra	86	90	90	75	21
Together AI	78	70	90	75	20
Fireworks AI	78	70	90	75	14
Groq	86	90	90	75	10
DeepSeek	72	70	70	75	3

Embed Badge

<a href="https://inferencebench.io/providers/deepinfra/"><img src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%22241%22%20height%3D%2220%22%20role%3D%22img%22%20aria-label%3D%22InferenceBench%20Verified%3A%20DeepInfra%22%3E%0A%20%20%3Ctitle%3EInferenceBench%20Verified%3A%20DeepInfra%3C%2Ftitle%3E%0A%20%20%3ClinearGradient%20id%3D%22s%22%20x2%3D%220%22%20y2%3D%22100%25%22%3E%0A%20%20%20%20%3Cstop%20offset%3D%220%22%20stop-color%3D%22%23bbb%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%20%20%3Cstop%20offset%3D%221%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%3C%2FlinearGradient%3E%0A%20%20%3CclipPath%20id%3D%22r%22%3E%0A%20%20%20%20%3Crect%20width%3D%22241%22%20height%3D%2220%22%20rx%3D%223%22%20fill%3D%22%23fff%22%2F%3E%0A%20%20%3C%2FclipPath%3E%0A%20%20%3Cg%20clip-path%3D%22url(%23r)%22%3E%0A%20%20%20%20%3Crect%20width%3D%22166%22%20height%3D%2220%22%20fill%3D%22%23333%22%2F%3E%0A%20%20%20%20%3Crect%20x%3D%22166%22%20width%3D%2275%22%20height%3D%2220%22%20fill%3D%22%238b5cf6%22%2F%3E%0A%20%20%20%20%3Crect%20width%3D%22241%22%20height%3D%2220%22%20fill%3D%22url(%23s)%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%20%20%3Cg%20fill%3D%22%23fff%22%20text-anchor%3D%22middle%22%20font-family%3D%22Verdana%2CGeneva%2CDejaVu%20Sans%2Csans-serif%22%20text-rendering%3D%22geometricPrecision%22%20font-size%3D%2211%22%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%2283%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%2283%22%20y%3D%2213%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%22203.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EDeepInfra%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%22203.5%22%20y%3D%2213%22%3EDeepInfra%3C%2Ftext%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E" alt="InferenceBench Verified — DeepInfra" /></a>