Developer API

Access GPU specs, AI model benchmarks, and provider pricing programmatically. Our RESTful API serves JSON data covering 160+ models, 47 GPUs, and 19 cloud providers.

Quick Start

The API is open and requires no authentication. All endpoints return JSON and support CORS.

bash

curl -s "https://inferencebench.io/api/v1/models?limit=5" | jq '.data[].name'

Base URL

https://inferencebench.io/api/v1/

All API v1 endpoints are prefixed with /api/v1/. The health endpoint is at /api/health.

Rate Limits

Requests per minute

1 min

Rate limit window

1 hr

Cache TTL

Rate limit headers are included in every response: X-RateLimit-Remaining and X-RateLimit-Reset. If you exceed the limit, you will receive a 429 response with a Retry-After header.

Authentication

The API is currently open and does not require authentication. API key support is coming soon for higher rate limits and additional endpoints. Join our newsletter to be notified when API keys become available.

Endpoints

GET/api/v1/models

List all AI models with architecture details, memory requirements, capabilities, and API pricing.

Query Parameters

Name	Type	Description
family	string	Filter by model family (e.g. llama, gpt, claude)
type	string	Filter by architecture type: dense, moe, or hybrid
limit	number	Max results per page (default 50, max 100)
offset	number	Pagination offset (default 0)

Example Request

curl

curl -s "https://inferencebench.io/api/v1/models" | jq .

Example Response

json

{
  "data": [
    {
      "id": "meta/llama-3.1-70b",
      "name": "Llama 3.1 70B",
      "family": "llama",
      "architecture": {
        "type": "dense",
        "total_params_b": 70,
        "active_params_b": 70,
        "num_layers": 80,
        "hidden_dim": 8192,
        "num_heads": 64,
        "num_kv_heads": 8,
        "head_dim": 128,
        "vocab_size": 128256
      },
      "context_window": 131072,
      "memory": { ... },
      "capabilities": { ... },
      "api_pricing": { ... }
    }
  ],
  "total": 160,
  "limit": 50,
  "offset": 0,
  "_links": {
    "self": "/api/v1/models?limit=50&offset=0",
    "next": "/api/v1/models?limit=50&offset=50"
  }
}

GET/api/v1/gpus

List all GPUs with VRAM, bandwidth, compute specs, and cloud pricing from multiple providers.

Query Parameters

Name	Type	Description
vendor	string	Filter by vendor: nvidia or amd
limit	number	Max results per page (default 50, max 100)
offset	number	Pagination offset (default 0)

Example Request

curl

curl -s "https://inferencebench.io/api/v1/gpus" | jq .

Example Response

json

{
  "data": [
    {
      "id": "h100-sxm",
      "name": "H100 SXM",
      "vendor": "nvidia",
      "vram_gb": 80,
      "memory_bandwidth_gbps": 3350,
      "fp16_tflops": 989,
      "cloud_pricing": { ... }
    }
  ],
  "total": 47,
  "limit": 50,
  "offset": 0
}

GET/api/v1/providers

List all cloud GPU providers with supported GPUs, regions, and pricing tiers.

Query Parameters

Name	Type	Description
limit	number	Max results per page (default 50, max 100)
offset	number	Pagination offset (default 0)

Example Request

curl

curl -s "https://inferencebench.io/api/v1/providers" | jq .

Example Response

json

{
  "data": [
    {
      "id": "lambda",
      "name": "Lambda",
      "type": "cloud",
      "supported_gpus": ["h100-sxm", "a100-80gb"],
      "regions": ["us-east", "us-west"],
      "pricing": { ... }
    }
  ],
  "total": 19,
  "limit": 50,
  "offset": 0
}

GET/api/health

Health check endpoint. Returns service status and version.

Example Request

curl

curl -s "https://inferencebench.io/api/health" | jq .

Example Response

json

{
  "status": "ok",
  "version": "1.0.0",
  "timestamp": "2026-04-01T00:00:00.000Z"
}

Error Codes

Status	Description
200	Success
400	Bad request — invalid query parameters
404	Endpoint not found
429	Rate limit exceeded — retry after the Retry-After header
500	Internal server error