Provider Overview
Type
inference
Billing
Per token
Egress
Free
SLA Uptime
99.9%
Autoscaling
Yes
Cold Start
None
Model Pricing (8)
| Model | Input $/M | Output $/M | Latency | Throughput | Context |
|---|---|---|---|---|---|
| llama-3.1-8bCheapest | $0.10 | $0.10 | 0.08s | 1000 t/s | 128k |
| qwen-2.5-7b | $0.10 | $0.10 | 0.06s | 900 t/s | 32k |
| llama-3.1-70b | $0.60 | $0.60 | 0.15s | 400 t/s | 128k |
| llama-3.3-70b | $0.60 | $0.60 | 0.12s | 450 t/s | 128k |
| deepseek-r1-distill-llama-70b | $0.60 | $0.60 | 0.15s | 350 t/s | 128k |
| qwen-2.5-72b | $0.60 | $0.60 | 0.15s | 380 t/s | 32k |
| llama-3.1-405b | $2.50 | $2.50 | 0.3s | 130 t/s | 128k |
| deepseek-r1 | $2.00 | $5.00 | 1.5s | 100 t/s | 64k |
Reputation Details
Pricing
70
Reliability
90
Features
75
Highlights
- Good pricing
- 99.9%+ SLA
- Autoscaling supported
- Fast cold start
Compare with Others
| Provider | Overall | Pricing | Reliability | Features | Models |
|---|---|---|---|---|---|
| SambaNova | 78 | 70 | 90 | 75 | 8 |
| Together AI | 78 | 70 | 90 | 75 | 20 |
| Fireworks AI | 78 | 70 | 90 | 75 | 14 |
| Groq | 86 | 90 | 90 | 75 | 10 |
| DeepInfra | 86 | 90 | 90 | 75 | 21 |