Skip to content
Replicate

Replicate

Inference API Provider

Reputation:
61/100
replicate.com

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.5%

Autoscaling

Yes

Cold Start

5000ms

Model Pricing (7)

ModelInput $/MOutput $/MLatencyThroughputContext
llama-3.1-8bCheapest$0.05$0.250.3s150 t/s128k
mixtral-8x7b$0.30$1.000.3s80 t/s33k
llama-3.1-70b$0.65$2.750.5s60 t/s128k
llama-3.3-70b$0.65$2.750.45s65 t/s128k
qwen-2.5-72b$0.65$2.750.5s55 t/s32k
llama-3.1-405b$1.00$5.001s25 t/s128k
deepseek-r1$1.50$5.003s20 t/s64k

Reputation Details

Pricing
50
Reliability
70
Features
65

Highlights

  • Autoscaling supported

Compare with Others

ProviderOverallPricingReliabilityFeaturesModels
Replicate615070657
Together AI7870907520
Fireworks AI7870907514
Groq8690907510
DeepInfra8690907521