Provider Overview
Type
inference
Billing
Per token
Egress
Free
SLA Uptime
99.5%
Autoscaling
Yes
Cold Start
5000ms
Model Pricing (7)
| Model | Input $/M | Output $/M | Latency | Throughput | Context |
|---|---|---|---|---|---|
| llama-3.1-8bCheapest | $0.05 | $0.25 | 0.3s | 150 t/s | 128k |
| mixtral-8x7b | $0.30 | $1.00 | 0.3s | 80 t/s | 33k |
| llama-3.1-70b | $0.65 | $2.75 | 0.5s | 60 t/s | 128k |
| llama-3.3-70b | $0.65 | $2.75 | 0.45s | 65 t/s | 128k |
| qwen-2.5-72b | $0.65 | $2.75 | 0.5s | 55 t/s | 32k |
| llama-3.1-405b | $1.00 | $5.00 | 1s | 25 t/s | 128k |
| deepseek-r1 | $1.50 | $5.00 | 3s | 20 t/s | 64k |
Reputation Details
Pricing
50
Reliability
70
Features
65
Highlights
- Autoscaling supported
Compare with Others
| Provider | Overall | Pricing | Reliability | Features | Models |
|---|---|---|---|---|---|
| Replicate | 61 | 50 | 70 | 65 | 7 |
| Together AI | 78 | 70 | 90 | 75 | 20 |
| Fireworks AI | 78 | 70 | 90 | 75 | 14 |
| Groq | 86 | 90 | 90 | 75 | 10 |
| DeepInfra | 86 | 90 | 90 | 75 | 21 |