What GPUs does Replicate offer?

Replicate is an inference api provider offering 7 AI model endpoints.

What is Replicate pricing?

Replicate model pricing starts from $0.05/M input tokens and $0.25/M output tokens.

Does Replicate offer autoscaling?

Yes, Replicate supports autoscaling for dynamic workload management. Cold start time is approximately 5000ms.

Replicate

Inference API Provider

Reputation:

61/100

Get your Replicate API key replicate.com

Replicate offers 7 model endpoints with output pricing starting at $0.25/million tokens. Compared to the market average of $1.03/million output tokens across inference API providers, Replicate's entry-level pricing is 76% below average.

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.5%

Autoscaling

Yes

Cold Start

5000ms

Model Pricing (7)

Model	Input $/M	Output $/M	Latency	Throughput	Context
llama-3.1-8bCheapest	$0.05	$0.25	0.3s	150 t/s	128k
mixtral-8x7b	$0.30	$1.00	0.3s	80 t/s	33k
llama-3.1-70b	$0.65	$2.75	0.5s	60 t/s	128k
llama-3.3-70b	$0.65	$2.75	0.45s	65 t/s	128k
qwen-2.5-72b	$0.65	$2.75	0.5s	55 t/s	32k
llama-3.1-405b	$1.00	$5.00	1s	25 t/s	128k
deepseek-r1	$1.50	$5.00	3s	20 t/s	64k

Reputation Details

Pricing

Reliability

Features

Highlights

Autoscaling supported

Compare with Others

Provider	Overall	Pricing	Reliability	Features	Models
Replicate	61	50	70	65	7
Together AI	78	70	90	75	20
Fireworks AI	78	70	90	75	14
Groq	86	90	90	75	10
DeepInfra	86	90	90	75	21

Embed Badge

<a href="https://inferencebench.io/providers/replicate/"><img src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%22241%22%20height%3D%2220%22%20role%3D%22img%22%20aria-label%3D%22InferenceBench%20Verified%3A%20Replicate%22%3E%0A%20%20%3Ctitle%3EInferenceBench%20Verified%3A%20Replicate%3C%2Ftitle%3E%0A%20%20%3ClinearGradient%20id%3D%22s%22%20x2%3D%220%22%20y2%3D%22100%25%22%3E%0A%20%20%20%20%3Cstop%20offset%3D%220%22%20stop-color%3D%22%23bbb%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%20%20%3Cstop%20offset%3D%221%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%3C%2FlinearGradient%3E%0A%20%20%3CclipPath%20id%3D%22r%22%3E%0A%20%20%20%20%3Crect%20width%3D%22241%22%20height%3D%2220%22%20rx%3D%223%22%20fill%3D%22%23fff%22%2F%3E%0A%20%20%3C%2FclipPath%3E%0A%20%20%3Cg%20clip-path%3D%22url(%23r)%22%3E%0A%20%20%20%20%3Crect%20width%3D%22166%22%20height%3D%2220%22%20fill%3D%22%23333%22%2F%3E%0A%20%20%20%20%3Crect%20x%3D%22166%22%20width%3D%2275%22%20height%3D%2220%22%20fill%3D%22%238b5cf6%22%2F%3E%0A%20%20%20%20%3Crect%20width%3D%22241%22%20height%3D%2220%22%20fill%3D%22url(%23s)%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%20%20%3Cg%20fill%3D%22%23fff%22%20text-anchor%3D%22middle%22%20font-family%3D%22Verdana%2CGeneva%2CDejaVu%20Sans%2Csans-serif%22%20text-rendering%3D%22geometricPrecision%22%20font-size%3D%2211%22%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%2283%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%2283%22%20y%3D%2213%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%22203.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EReplicate%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%22203.5%22%20y%3D%2213%22%3EReplicate%3C%2Ftext%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E" alt="InferenceBench Verified — Replicate" /></a>