InferenceBench

Configuration

Select a model to begin

Select a GPU first to see available providers

GPU $/hr$4.18

Nodes1

GPUs/Node8

Scaling Mode

Splits model layers across GPUs. Best for large models that don't fit on a single GPU. Requires fast interconnect (NVLink).

Utilization %70%

Overhead %30%

Framework affects throughput predictions via CUDA optimization factors.

Select model & GPU

Input Length1024

→ 1024

Output Length1024

→ 1024

Concurrency32

→ 32

Input:Output %40% input

Output $/M$8.00

Input $/M$2.00

Select a model and GPU to see unit economics

InferenceBench