Skip to content

Together AI vs Fireworks AI

Together AI

inference provider · token billing

Visit website

Fireworks AI

inference provider · token billing

Visit website

Token Pricing Comparison

ModelTogether AI In $/MOut $/MFireworks AI In $/MOut $/M
phi-3-mini-128k$0.10$0.10
llama-3.1-8b$0.18$0.18$0.20$0.20
qwen-2.5-7b$0.20$0.20$0.20$0.20
codellama-7b$0.20$0.20
gemma-2-9b$0.30$0.30$0.20$0.20
codellama-13b$0.22$0.22
phi-4-14b$0.30$0.30
deepseek-v3$0.50$0.50$0.50$0.50
mixtral-8x7b$0.60$0.60$0.50$0.50
qwen-2.5-32b$0.50$0.50$0.50$0.50
qwen-2.5-coder-32b$0.50$0.50$0.50$0.50
phi-3-medium-128k$0.50$0.50
codellama-34b$0.78$0.78
gemma-2-27b$0.80$0.80$0.90$0.90
llama-3.1-70b$0.88$0.88$0.90$0.90
llama-3.3-70b$0.88$0.88$0.90$0.90
qwen-2.5-72b$0.90$0.90$0.90$0.90
mixtral-8x22b$1.20$1.20$1.20$1.20
llama-3.1-405b$3.50$3.50$3.00$3.00
deepseek-r1$3.00$7.50$3.00$8.00

Latency & Throughput

ModelTogether AI Latencytok/sFireworks AI Latencytok/s
phi-3-mini-128k0.15s220
llama-3.1-8b0.2s2000.15s250
qwen-2.5-7b0.2s1800.15s200
codellama-7b0.15s200
gemma-2-9b0.2s1600.15s180
codellama-13b0.2s150
phi-4-14b0.2s140
deepseek-v30.4s700.35s75
mixtral-8x7b0.3s1000.2s120
qwen-2.5-32b0.3s1100.25s110
qwen-2.5-coder-32b0.3s1050.25s105
phi-3-medium-128k0.25s120
codellama-34b0.4s70
gemma-2-27b0.3s850.3s85
llama-3.1-70b0.4s800.3s90
llama-3.3-70b0.35s850.28s95
qwen-2.5-72b0.4s750.35s80
mixtral-8x22b0.5s600.45s65
llama-3.1-405b0.8s350.7s40
deepseek-r12s302.5s25

Feature Comparison

FeatureTogether AIFireworks AI
Provider TypeInference APIInference API
Billing Granularitytokentoken
Autoscaling Yes Yes
SLA Uptime99.9%99.9%
Cold StartNoneNone
Free Egress Yes Yes
Storage CostN/AN/A
GPU CountN/AN/A
Models Offered20 models14 models

Pros & Cons Summary

Together AI

Strengths

  • +Lower token pricing than Fireworks AI on most shared models
  • +Broader model catalog (20 vs 14 models)

Fireworks AI

Weaknesses

  • -Higher token pricing than Together AI on most shared models
  • -Smaller model catalog (14 vs 20 models)

Compare Other Providers