Hugging Face Inference

Inference API Provider

Reputation:

69/100

huggingface.co/inference-api

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.5%

Autoscaling

Yes

Cold Start

10000ms

Model Pricing (9)

Model	Input $/M	Output $/M	Latency	Throughput	Context
llama-3.1-8bCheapest	$0.10	$0.10	0.25s	150 t/s	128k
qwen-2.5-7b	$0.10	$0.10	0.2s	160 t/s	32k
phi-3.5-mini	$0.10	$0.10	0.15s	200 t/s	128k
mixtral-8x7b	$0.35	$0.35	0.25s	100 t/s	33k
gemma-2-27b	$0.50	$0.50	0.35s	75 t/s	8k
llama-3.1-70b	$0.65	$0.65	0.45s	65 t/s	128k
llama-3.3-70b	$0.65	$0.65	0.4s	70 t/s	128k
qwen-2.5-72b	$0.65	$0.65	0.45s	60 t/s	32k
deepseek-r1	$2.50	$7.00	3s	20 t/s	64k

Reputation Details

Pricing

70

Reliability

70

Features

65

Highlights

Good pricing
Autoscaling supported

Compare with Others

Provider	Overall	Pricing	Reliability	Features	Models
Hugging Face Inference	69	70	70	65	9
Together AI	78	70	90	75	20
Fireworks AI	78	70	90	75	14
Groq	86	90	90	75	10
DeepInfra	86	90	90	75	21