Skip to content

Developer API

Access GPU specs, AI model benchmarks, and provider pricing programmatically. Our RESTful API serves JSON data covering 160+ models, 47 GPUs, and 19 cloud providers.

Quick Start

The API is open and requires no authentication. All endpoints return JSON and support CORS.

bash
curl -s "https://inferencebench.io/api/v1/models?limit=5" | jq '.data[].name'

Base URL

https://inferencebench.io/api/v1/

All API v1 endpoints are prefixed with /api/v1/. The health endpoint is at /api/health.

Rate Limits

60

Requests per minute

1 min

Rate limit window

1 hr

Cache TTL

Rate limit headers are included in every response: X-RateLimit-Remaining and X-RateLimit-Reset. If you exceed the limit, you will receive a 429 response with a Retry-After header.

Authentication

The API is currently open and does not require authentication. API key support is coming soon for higher rate limits and additional endpoints. Join our newsletter to be notified when API keys become available.

Endpoints

GET/api/v1/models

List all AI models with architecture details, memory requirements, capabilities, and API pricing.

Query Parameters

NameTypeDescription
familystringFilter by model family (e.g. llama, gpt, claude)
typestringFilter by architecture type: dense, moe, or hybrid
limitnumberMax results per page (default 50, max 100)
offsetnumberPagination offset (default 0)

Example Request

curl
curl -s "https://inferencebench.io/api/v1/models" | jq .

Example Response

json
{
  "data": [
    {
      "id": "meta/llama-3.1-70b",
      "name": "Llama 3.1 70B",
      "family": "llama",
      "architecture": {
        "type": "dense",
        "total_params_b": 70,
        "active_params_b": 70,
        "num_layers": 80,
        "hidden_dim": 8192,
        "num_heads": 64,
        "num_kv_heads": 8,
        "head_dim": 128,
        "vocab_size": 128256
      },
      "context_window": 131072,
      "memory": { ... },
      "capabilities": { ... },
      "api_pricing": { ... }
    }
  ],
  "total": 160,
  "limit": 50,
  "offset": 0,
  "_links": {
    "self": "/api/v1/models?limit=50&offset=0",
    "next": "/api/v1/models?limit=50&offset=50"
  }
}
GET/api/v1/gpus

List all GPUs with VRAM, bandwidth, compute specs, and cloud pricing from multiple providers.

Query Parameters

NameTypeDescription
vendorstringFilter by vendor: nvidia or amd
limitnumberMax results per page (default 50, max 100)
offsetnumberPagination offset (default 0)

Example Request

curl
curl -s "https://inferencebench.io/api/v1/gpus" | jq .

Example Response

json
{
  "data": [
    {
      "id": "h100-sxm",
      "name": "H100 SXM",
      "vendor": "nvidia",
      "vram_gb": 80,
      "memory_bandwidth_gbps": 3350,
      "fp16_tflops": 989,
      "cloud_pricing": { ... }
    }
  ],
  "total": 47,
  "limit": 50,
  "offset": 0
}
GET/api/v1/providers

List all cloud GPU providers with supported GPUs, regions, and pricing tiers.

Query Parameters

NameTypeDescription
limitnumberMax results per page (default 50, max 100)
offsetnumberPagination offset (default 0)

Example Request

curl
curl -s "https://inferencebench.io/api/v1/providers" | jq .

Example Response

json
{
  "data": [
    {
      "id": "lambda",
      "name": "Lambda",
      "type": "cloud",
      "supported_gpus": ["h100-sxm", "a100-80gb"],
      "regions": ["us-east", "us-west"],
      "pricing": { ... }
    }
  ],
  "total": 19,
  "limit": 50,
  "offset": 0
}
GET/api/health

Health check endpoint. Returns service status and version.

Example Request

curl
curl -s "https://inferencebench.io/api/health" | jq .

Example Response

json
{
  "status": "ok",
  "version": "1.0.0",
  "timestamp": "2026-04-01T00:00:00.000Z"
}

Error Codes

StatusDescription
200Success
400Bad request — invalid query parameters
404Endpoint not found
429Rate limit exceeded — retry after the Retry-After header
500Internal server error