Skip to content
Hugging Face Inference

Hugging Face Inference

Inference API Provider

Reputation:
69/100
huggingface.co/inference-api

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.5%

Autoscaling

Yes

Cold Start

10000ms

Model Pricing (9)

ModelInput $/MOutput $/MLatencyThroughputContext
llama-3.1-8bCheapest$0.10$0.100.25s150 t/s128k
qwen-2.5-7b$0.10$0.100.2s160 t/s32k
phi-3.5-mini$0.10$0.100.15s200 t/s128k
mixtral-8x7b$0.35$0.350.25s100 t/s33k
gemma-2-27b$0.50$0.500.35s75 t/s8k
llama-3.1-70b$0.65$0.650.45s65 t/s128k
llama-3.3-70b$0.65$0.650.4s70 t/s128k
qwen-2.5-72b$0.65$0.650.45s60 t/s32k
deepseek-r1$2.50$7.003s20 t/s64k

Reputation Details

Pricing
70
Reliability
70
Features
65

Highlights

  • Good pricing
  • Autoscaling supported

Compare with Others

ProviderOverallPricingReliabilityFeaturesModels
Hugging Face Inference697070659
Together AI7870907520
Fireworks AI7870907514
Groq8690907510
DeepInfra8690907521