Updated minutes ago
H100 SXM
nvidia · hopper · 80 GB HBM3 · 700W TDP
VRAM
80 GB
BF16 TFLOPS
990
Bandwidth
3350 GB/s
From
$1.89/hr
Spec Sheet
VRAM80 GB HBM3
Memory Bandwidth3350 GB/s
BF16 TFLOPS990
FP16 TFLOPS990
FP8 TFLOPS1979
INT8 TOPS1979
TDP700W
InterconnectNVLINK
NVLink Bandwidth900 GB/s
Max per Node8
PCIe Gen5
CUDA Compute Capability9
Tensor CoresYes
Pricing by Provider
| Provider | On-Demand | Reserved | Spot | Badge |
|---|---|---|---|---|
| lambda | $2.49/hr | $1.89/hr | - | Cheapest |
| fluidstack | $2.85/hr | - | $2.10/hr | |
| tensordock | $3.29/hr | - | $2.49/hr | |
| vast_ai | $3.40/hr | - | $2.50/hr | |
| coreweave | $3.79/hr | $2.57/hr | - | |
| runpod | $4.18/hr | - | $3.29/hr | |
| gcp | $4.85/hr | $3.40/hr | - | |
| azure | $4.98/hr | $3.49/hr | - | |
| aws | $5.12/hr | $3.59/hr | - |
Pricing History
runpod
$4.18/hr→ 0.0%
2024-01-012025-03-01
Low: $4.18High: $5.50
lambda
$2.49/hr→ 0.0%
2024-01-012025-03-01
Low: $2.49High: $3.29
coreweave
$3.85/hr→ 0.0%
2024-01-012025-03-01
Low: $3.85High: $4.76
Compatible Models (249)
Single GPU (186 models)
DeepSeek LLM 67B67B FP8Jamba 1.5 Mini52B FP8Llama 3.1 Nemotron 51B51B FP8Amazon Nova Pro50B FP8Mixtral 8x7B46.7B FP8Mixtral 8x7B Instruct46.7B FP8Phi 3.5 MoE41.9B FP8Falcon 40B40B FP8VILA 1.5 40B40B FP8Aya 23 35B35B FP8Command R35B FP8Command R (August 2024)35B FP8Yi 1.5 34B34.4B FP8Code Llama 34B34B FP8DeepSeek Coder 33B33B FP8Vicuna 33B33B FP8WizardCoder 33B33B FP8DeepSeek R1 Distill 32B32.8B FP8Qwen 3 32B32.8B FP8Qwen 2.5 32B32.5B FP8+166 more
Multi-GPU (63 models)
DBRX Basex2 FP8DBRX Instructx2 FP8Mistral Large 2411x2 FP8Mistral Large 2x2 FP8Llama 4 Scoutx2 FP8Command R+x2 FP8Yi-Largex2 FP8YaLM 100Bx2 FP8Llama 3.2 90B Visionx2 FP8Llama 3.2 90B Vision Instructx2 FP8Qwen 2.5 72Bx2 FP8Qwen 2.5 Math 72Bx2 FP8Qwen 2.5 VL 72Bx2 FP8Dolphin 2.9 72Bx2 FP8DeepSeek R1 Distill 70Bx2 FP8+48 more
Training Capabilities
Estimated GPU count for full fine-tuning (AdamW, BF16) and QLoRA
| Model Size | Full Fine-Tune | QLoRA |
|---|---|---|
| 7B model | 2 GPUs | 1 GPU |
| 13B model | 4 GPUs | 1 GPU |
| 70B model | 17 GPUs | 1 GPU |
Energy Efficiency
Estimated tokens/second per Watt for popular models
Mistral 7B
0.66 t/s/WFP8
Qwen 2.5 7B
0.63 t/s/WFP8
Llama 3.1 8B
0.60 t/s/WFP8
Llama 3.1 70B
0.07 t/s/WFP8
Qwen 2.5 72B
0.07 t/s/WFP8