Developer API
Access GPU specs, AI model benchmarks, and provider pricing programmatically. Our RESTful API serves JSON data covering 160+ models, 47 GPUs, and 19 cloud providers.
Quick Start
The API is open and requires no authentication. All endpoints return JSON and support CORS.
curl -s "https://inferencebench.io/api/v1/models?limit=5" | jq '.data[].name'Base URL
https://inferencebench.io/api/v1/All API v1 endpoints are prefixed with /api/v1/. The health endpoint is at /api/health.
Rate Limits
60
Requests per minute
1 min
Rate limit window
1 hr
Cache TTL
Rate limit headers are included in every response: X-RateLimit-Remaining and X-RateLimit-Reset. If you exceed the limit, you will receive a 429 response with a Retry-After header.
Authentication
The API is currently open and does not require authentication. API key support is coming soon for higher rate limits and additional endpoints. Join our newsletter to be notified when API keys become available.
Endpoints
/api/v1/modelsList all AI models with architecture details, memory requirements, capabilities, and API pricing.
Query Parameters
| Name | Type | Description |
|---|---|---|
| family | string | Filter by model family (e.g. llama, gpt, claude) |
| type | string | Filter by architecture type: dense, moe, or hybrid |
| limit | number | Max results per page (default 50, max 100) |
| offset | number | Pagination offset (default 0) |
Example Request
curl -s "https://inferencebench.io/api/v1/models" | jq .Example Response
{
"data": [
{
"id": "meta/llama-3.1-70b",
"name": "Llama 3.1 70B",
"family": "llama",
"architecture": {
"type": "dense",
"total_params_b": 70,
"active_params_b": 70,
"num_layers": 80,
"hidden_dim": 8192,
"num_heads": 64,
"num_kv_heads": 8,
"head_dim": 128,
"vocab_size": 128256
},
"context_window": 131072,
"memory": { ... },
"capabilities": { ... },
"api_pricing": { ... }
}
],
"total": 160,
"limit": 50,
"offset": 0,
"_links": {
"self": "/api/v1/models?limit=50&offset=0",
"next": "/api/v1/models?limit=50&offset=50"
}
}/api/v1/gpusList all GPUs with VRAM, bandwidth, compute specs, and cloud pricing from multiple providers.
Query Parameters
| Name | Type | Description |
|---|---|---|
| vendor | string | Filter by vendor: nvidia or amd |
| limit | number | Max results per page (default 50, max 100) |
| offset | number | Pagination offset (default 0) |
Example Request
curl -s "https://inferencebench.io/api/v1/gpus" | jq .Example Response
{
"data": [
{
"id": "h100-sxm",
"name": "H100 SXM",
"vendor": "nvidia",
"vram_gb": 80,
"memory_bandwidth_gbps": 3350,
"fp16_tflops": 989,
"cloud_pricing": { ... }
}
],
"total": 47,
"limit": 50,
"offset": 0
}/api/v1/providersList all cloud GPU providers with supported GPUs, regions, and pricing tiers.
Query Parameters
| Name | Type | Description |
|---|---|---|
| limit | number | Max results per page (default 50, max 100) |
| offset | number | Pagination offset (default 0) |
Example Request
curl -s "https://inferencebench.io/api/v1/providers" | jq .Example Response
{
"data": [
{
"id": "lambda",
"name": "Lambda",
"type": "cloud",
"supported_gpus": ["h100-sxm", "a100-80gb"],
"regions": ["us-east", "us-west"],
"pricing": { ... }
}
],
"total": 19,
"limit": 50,
"offset": 0
}/api/healthHealth check endpoint. Returns service status and version.
Example Request
curl -s "https://inferencebench.io/api/health" | jq .Example Response
{
"status": "ok",
"version": "1.0.0",
"timestamp": "2026-04-01T00:00:00.000Z"
}Error Codes
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad request — invalid query parameters |
| 404 | Endpoint not found |
| 429 | Rate limit exceeded — retry after the Retry-After header |
| 500 | Internal server error |