Provider Overview
Type
inference
Billing
Per token
Egress
Free
SLA Uptime
99.5%
Autoscaling
Yes
Cold Start
10000ms
Model Pricing (9)
| Model | Input $/M | Output $/M | Latency | Throughput | Context |
|---|---|---|---|---|---|
| llama-3.1-8bCheapest | $0.10 | $0.10 | 0.25s | 150 t/s | 128k |
| qwen-2.5-7b | $0.10 | $0.10 | 0.2s | 160 t/s | 32k |
| phi-3.5-mini | $0.10 | $0.10 | 0.15s | 200 t/s | 128k |
| mixtral-8x7b | $0.35 | $0.35 | 0.25s | 100 t/s | 33k |
| gemma-2-27b | $0.50 | $0.50 | 0.35s | 75 t/s | 8k |
| llama-3.1-70b | $0.65 | $0.65 | 0.45s | 65 t/s | 128k |
| llama-3.3-70b | $0.65 | $0.65 | 0.4s | 70 t/s | 128k |
| qwen-2.5-72b | $0.65 | $0.65 | 0.45s | 60 t/s | 32k |
| deepseek-r1 | $2.50 | $7.00 | 3s | 20 t/s | 64k |
Reputation Details
Pricing
70
Reliability
70
Features
65
Highlights
- Good pricing
- Autoscaling supported
Compare with Others
| Provider | Overall | Pricing | Reliability | Features | Models |
|---|---|---|---|---|---|
| Hugging Face Inference | 69 | 70 | 70 | 65 | 9 |
| Together AI | 78 | 70 | 90 | 75 | 20 |
| Fireworks AI | 78 | 70 | 90 | 75 | 14 |
| Groq | 86 | 90 | 90 | 75 | 10 |
| DeepInfra | 86 | 90 | 90 | 75 | 21 |