Select a GPU first to see available providers
Infrastructure
Scaling Mode
Splits model layers across GPUs. Best for large models that don't fit on a single GPU. Requires fast interconnect (NVLink).
Inference Framework
Framework affects throughput predictions via CUDA optimization factors.
GPU Memory
Select model & GPU