Updated minutes ago
A40
nvidia · ampere · 48 GB GDDR6 · 300W TDP
VRAM
48 GB
BF16 TFLOPS
37.4
Bandwidth
696 GB/s
From
$0.42/hr
Spec Sheet
VRAM48 GB GDDR6
Memory Bandwidth696 GB/s
BF16 TFLOPS37.4
FP16 TFLOPS37.4
FP8 TFLOPS37.4
INT8 TOPS74.8
TDP300W
InterconnectPCIE
Max per Node8
PCIe Gen4
CUDA Compute Capability8.6
Tensor CoresYes
Pricing by Provider
| Provider | On-Demand | Reserved | Spot | Badge |
|---|---|---|---|---|
| tensordock | $0.59/hr | - | $0.42/hr | Cheapest |
| vast_ai | $0.65/hr | - | $0.44/hr | |
| runpod | $0.89/hr | - | $0.65/hr | |
| lambda | $0.79/hr | - | - |
Compatible Models (239)
Single GPU (179 models)
Falcon 40B40B FP8VILA 1.5 40B40B FP8Aya 23 35B35B FP8Command R35B FP8Command R (August 2024)35B FP8Yi 1.5 34B34.4B FP8Code Llama 34B34B FP8DeepSeek Coder 33B33B FP8Vicuna 33B33B FP8WizardCoder 33B33B FP8DeepSeek R1 Distill 32B32.8B FP8Qwen 3 32B32.8B FP8Qwen 2.5 32B32.5B FP8Qwen 2.5 Coder 32B32.5B FP8Qwen 3 30B-A3B30.5B FP8JAIS 30B30B FP8MPT 30B30B FP8Gemma 2 27B27B FP8Gemma 3 27B27B FP8InternVL2 26B26B FP8+159 more
Multi-GPU (60 models)
Qwen 2.5 72Bx2 FP8Qwen 2.5 Math 72Bx2 FP8Qwen 2.5 VL 72Bx2 FP8Dolphin 2.9 72Bx2 FP8DeepSeek R1 Distill 70Bx2 FP8Llama 3 70B 1M Contextx2 FP8Llama 3 70Bx2 FP8Llama 3.1 70Bx2 FP8Llama 3.3 70Bx2 FP8Hermes 3 70Bx2 FP8HelpSteer2 Llama 3.1 70Bx2 FP8Llama 3.1 Nemotron 70B Instructx2 FP8Llama 3.1 Nemotron 70B Rewardx2 FP8Nemotron 70Bx2 FP8Llama 3.1 70B Turbox2 FP8+45 more
Training Capabilities
Estimated GPU count for full fine-tuning (AdamW, BF16) and QLoRA
| Model Size | Full Fine-Tune | QLoRA |
|---|---|---|
| 7B model | 3 GPUs | 1 GPU |
| 13B model | 6 GPUs | 1 GPU |
| 70B model | 28 GPUs | 1 GPU |
Energy Efficiency
Estimated tokens/second per Watt for popular models
Mistral 7B
0.32 t/s/WFP8
Qwen 2.5 7B
0.31 t/s/WFP8
Llama 3.1 8B
0.29 t/s/WFP8
Llama 3.1 70B
0.03 t/s/WFP8
Qwen 2.5 72B
0.03 t/s/WFP8
Similar GPUs
| GPU | VRAM | BF16 TFLOPS | BW (GB/s) | From |
|---|---|---|---|---|
| RTX A6000 | 48 GB | 38.7 | 768 | $0.49/hr |
| A100 40GB SXM | 40 GB | 312 | 1555 | $0.85/hr |
| A100 40GB PCIe | 40 GB | 312 | 1555 | $0.69/hr |
| A16 | 64 GB | 16.8 | 232 | $0.72/hr |
| A10G | 24 GB | 35 | 600 | $0.30/hr |