Provider Overview
Type
inference
Billing
Per token
Egress
Free
SLA Uptime
99.9%
Autoscaling
Yes
Cold Start
None
Model Pricing (20)
| Model | Input $/M | Output $/M | Latency | Throughput | Context |
|---|---|---|---|---|---|
| phi-3-mini-128kCheapest | $0.10 | $0.10 | 0.15s | 220 t/s | 128k |
| llama-3.1-8b | $0.18 | $0.18 | 0.2s | 200 t/s | 128k |
| qwen-2.5-7b | $0.20 | $0.20 | 0.2s | 180 t/s | 32k |
| codellama-7b | $0.20 | $0.20 | 0.15s | 200 t/s | 16k |
| codellama-13b | $0.22 | $0.22 | 0.2s | 150 t/s | 16k |
| gemma-2-9b | $0.30 | $0.30 | 0.2s | 160 t/s | 8k |
| phi-4-14b | $0.30 | $0.30 | 0.2s | 140 t/s | 16k |
| deepseek-v3 | $0.50 | $0.50 | 0.4s | 70 t/s | 64k |
| qwen-2.5-32b | $0.50 | $0.50 | 0.3s | 110 t/s | 32k |
| qwen-2.5-coder-32b | $0.50 | $0.50 | 0.3s | 105 t/s | 32k |
| phi-3-medium-128k | $0.50 | $0.50 | 0.25s | 120 t/s | 128k |
| mixtral-8x7b | $0.60 | $0.60 | 0.3s | 100 t/s | 33k |
| codellama-34b | $0.78 | $0.78 | 0.4s | 70 t/s | 16k |
| gemma-2-27b | $0.80 | $0.80 | 0.3s | 85 t/s | 8k |
| llama-3.1-70b | $0.88 | $0.88 | 0.4s | 80 t/s | 128k |
| llama-3.3-70b | $0.88 | $0.88 | 0.35s | 85 t/s | 128k |
| qwen-2.5-72b | $0.90 | $0.90 | 0.4s | 75 t/s | 32k |
| mixtral-8x22b | $1.20 | $1.20 | 0.5s | 60 t/s | 66k |
| llama-3.1-405b | $3.50 | $3.50 | 0.8s | 35 t/s | 128k |
| deepseek-r1 | $3.00 | $7.50 | 2s | 30 t/s | 64k |
Reputation Details
Pricing
70
Reliability
90
Features
75
Highlights
- Good pricing
- 99.9%+ SLA
- Autoscaling supported
- Fast cold start
Compare with Others
| Provider | Overall | Pricing | Reliability | Features | Models |
|---|---|---|---|---|---|
| Together AI | 78 | 70 | 90 | 75 | 20 |
| Fireworks AI | 78 | 70 | 90 | 75 | 14 |
| Groq | 86 | 90 | 90 | 75 | 10 |
| DeepInfra | 86 | 90 | 90 | 75 | 21 |
| DeepSeek | 72 | 70 | 70 | 75 | 3 |