What GPUs does Hugging Face Inference offer?

Hugging Face Inference is an inference api provider offering 9 AI model endpoints.

What is Hugging Face Inference pricing?

Hugging Face Inference model pricing starts from $0.10/M input tokens and $0.10/M output tokens.

Does Hugging Face Inference offer autoscaling?

Yes, Hugging Face Inference supports autoscaling for dynamic workload management. Cold start time is approximately 10000ms.

Hugging Face Inference

Inference API Provider

Reputation:

69/100

Get your Hugging Face Inference API key huggingface.co/inference-api

Hugging Face Inference offers 9 model endpoints with output pricing starting at $0.10/million tokens. Compared to the market average of $1.03/million output tokens across inference API providers, Hugging Face Inference's entry-level pricing is 90% below average.

Provider Overview

Type

inference

Billing

Per token

Egress

Free

SLA Uptime

99.5%

Autoscaling

Yes

Cold Start

10000ms

Model Pricing (9)

Model	Input $/M	Output $/M	Latency	Throughput	Context
llama-3.1-8bCheapest	$0.10	$0.10	0.25s	150 t/s	128k
qwen-2.5-7b	$0.10	$0.10	0.2s	160 t/s	32k
phi-3.5-mini	$0.10	$0.10	0.15s	200 t/s	128k
mixtral-8x7b	$0.35	$0.35	0.25s	100 t/s	33k
gemma-2-27b	$0.50	$0.50	0.35s	75 t/s	8k
llama-3.1-70b	$0.65	$0.65	0.45s	65 t/s	128k
llama-3.3-70b	$0.65	$0.65	0.4s	70 t/s	128k
qwen-2.5-72b	$0.65	$0.65	0.45s	60 t/s	32k
deepseek-r1	$2.50	$7.00	3s	20 t/s	64k

Reputation Details

Pricing

Reliability

Features

Highlights

Good pricing
Autoscaling supported

Compare with Others

Provider	Overall	Pricing	Reliability	Features	Models
Hugging Face Inference	69	70	70	65	9
Together AI	78	70	90	75	20
Fireworks AI	78	70	90	75	14
Groq	86	90	90	75	10
DeepInfra	86	90	90	75	21

Embed Badge

<a href="https://inferencebench.io/providers/huggingface/"><img src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%22325%22%20height%3D%2220%22%20role%3D%22img%22%20aria-label%3D%22InferenceBench%20Verified%3A%20Hugging%20Face%20Inference%22%3E%0A%20%20%3Ctitle%3EInferenceBench%20Verified%3A%20Hugging%20Face%20Inference%3C%2Ftitle%3E%0A%20%20%3ClinearGradient%20id%3D%22s%22%20x2%3D%220%22%20y2%3D%22100%25%22%3E%0A%20%20%20%20%3Cstop%20offset%3D%220%22%20stop-color%3D%22%23bbb%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%20%20%3Cstop%20offset%3D%221%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%3C%2FlinearGradient%3E%0A%20%20%3CclipPath%20id%3D%22r%22%3E%0A%20%20%20%20%3Crect%20width%3D%22325%22%20height%3D%2220%22%20rx%3D%223%22%20fill%3D%22%23fff%22%2F%3E%0A%20%20%3C%2FclipPath%3E%0A%20%20%3Cg%20clip-path%3D%22url(%23r)%22%3E%0A%20%20%20%20%3Crect%20width%3D%22166%22%20height%3D%2220%22%20fill%3D%22%23333%22%2F%3E%0A%20%20%20%20%3Crect%20x%3D%22166%22%20width%3D%22159%22%20height%3D%2220%22%20fill%3D%22%238b5cf6%22%2F%3E%0A%20%20%20%20%3Crect%20width%3D%22325%22%20height%3D%2220%22%20fill%3D%22url(%23s)%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%20%20%3Cg%20fill%3D%22%23fff%22%20text-anchor%3D%22middle%22%20font-family%3D%22Verdana%2CGeneva%2CDejaVu%20Sans%2Csans-serif%22%20text-rendering%3D%22geometricPrecision%22%20font-size%3D%2211%22%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%2283%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%2283%22%20y%3D%2213%22%3EInferenceBench%20Verified%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%22245.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EHugging%20Face%20Inference%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%22245.5%22%20y%3D%2213%22%3EHugging%20Face%20Inference%3C%2Ftext%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E" alt="InferenceBench Verified — Hugging Face Inference" /></a>