Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

L40S

nvidia · ada · 48 GB GDDR6 · 350W TDP

VRAM

48 GB

BF16 TFLOPS

362

Bandwidth

864 GB/s

From

$0.85/hr

Calculate ROI with this GPU →

Spec Sheet

VRAM48 GB GDDR6

Memory Bandwidth864 GB/s

BF16 TFLOPS362

FP16 TFLOPS362

FP8 TFLOPS733

INT8 TOPS733

TDP350W

InterconnectPCIE

Max per Node8

PCIe Gen4

CUDA Compute Capability8.9

Tensor CoresYes

Pricing by Provider

Provider	On-Demand	Reserved	Spot	Badge
fluidstack	$1.09/hr	-	$0.85/hr	Cheapest
tensordock	$1.19/hr	-	$0.89/hr
vast_ai	$1.29/hr	-	$0.95/hr
lambda	$1.59/hr	$1.19/hr	-
coreweave	$1.84/hr	$1.34/hr	-
runpod	$1.90/hr	-	$1.49/hr
gcp	$2.45/hr	$1.62/hr	-
aws	$2.56/hr	$1.69/hr	-

Pricing History

runpod

$1.49/hr→ 0.0%

2024-01-012025-03-01

Low: $1.49High: $2.19

lambda

$1.19/hr→ 0.0%

2024-01-012025-03-01

Low: $1.19High: $1.89

Compatible Models (239)

Multi-GPU (60 models)

Qwen 2.5 72Bx2 FP8 Qwen 2.5 Math 72Bx2 FP8 Qwen 2.5 VL 72Bx2 FP8 Dolphin 2.9 72Bx2 FP8 DeepSeek R1 Distill 70Bx2 FP8 Llama 3 70B 1M Contextx2 FP8 Llama 3 70Bx2 FP8 Llama 3.1 70Bx2 FP8 Llama 3.3 70Bx2 FP8 Hermes 3 70Bx2 FP8 HelpSteer2 Llama 3.1 70Bx2 FP8 Llama 3.1 Nemotron 70B Instructx2 FP8 Llama 3.1 Nemotron 70B Rewardx2 FP8 Nemotron 70Bx2 FP8 Llama 3.1 70B Turbox2 FP8+45 more

Training Capabilities

Estimated GPU count for full fine-tuning (AdamW, BF16) and QLoRA

Model Size	Full Fine-Tune	QLoRA
7B model	3 GPUs	1 GPU
13B model	6 GPUs	1 GPU
70B model	28 GPUs	1 GPU

Train on this GPU →

Energy Efficiency

Estimated tokens/second per Watt for popular models

Mistral 7B

0.34 t/s/WFP8

Qwen 2.5 7B

0.32 t/s/WFP8

Llama 3.1 8B

0.31 t/s/WFP8

Llama 3.1 70B

0.03 t/s/WFP8

Qwen 2.5 72B