Replicate

Inference API Provider

Reputation:

61/100

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.5%

Autoscaling

Yes

Cold Start

5000ms

Model Pricing (7)

Model	Input $/M	Output $/M	Latency	Throughput	Context
llama-3.1-8bCheapest	$0.05	$0.25	0.3s	150 t/s	128k
mixtral-8x7b	$0.30	$1.00	0.3s	80 t/s	33k
llama-3.1-70b	$0.65	$2.75	0.5s	60 t/s	128k
llama-3.3-70b	$0.65	$2.75	0.45s	65 t/s	128k
qwen-2.5-72b	$0.65	$2.75	0.5s	55 t/s	32k
llama-3.1-405b	$1.00	$5.00	1s	25 t/s	128k
deepseek-r1	$1.50	$5.00	3s	20 t/s	64k

Reputation Details

Pricing

50

Reliability

70

Features

65

Highlights

Autoscaling supported

Compare with Others

Provider	Overall	Pricing	Reliability	Features	Models
Replicate	61	50	70	65	7
Together AI	78	70	90	75	20
Fireworks AI	78	70	90	75	14
Groq	86	90	90	75	10
DeepInfra	86	90	90	75	21