Updated minutes ago
A4000
nvidia · ampere · 16 GB GDDR6 · 140W TDP
VRAM
16 GB
BF16 TFLOPS
76
Bandwidth
448 GB/s
From
$0.17/hr
Spec Sheet
VRAM16 GB GDDR6
Memory Bandwidth448 GB/s
BF16 TFLOPS76
FP16 TFLOPS76
FP8 TFLOPS76
INT8 TOPS76
TDP140W
InterconnectPCIE
Max per Node8
PCIe Gen4
CUDA Compute Capability8.6
Tensor CoresYes
Pricing by Provider
| Provider | On-Demand | Reserved | Spot | Badge |
|---|---|---|---|---|
| tensordock | $0.25/hr | - | $0.17/hr | Cheapest |
| vast_ai | $0.30/hr | - | $0.18/hr |
Compatible Models (222)
Single GPU (134 models)
OLMo 2 13B13B FP8Baichuan 2 13B13B FP8Vicuna 13B13B FP8Code Llama 13B13B FP8Llama 2 13B13B FP8Orca 2 13B13B FP8VILA 1.5 13B13B FP8ELYZA 13B13B FP8Cerebras GPT 13B13B FP8KULLM 12.8B12.8B FP8StableLM 2 12B12.1B FP8Amazon Nova Lite12B FP8Gemma 3 12B12B FP8Mistral Nemo 12B12B FP8Pixtral 12B12B FP8FLUX.1 Dev12B FP8Llama 3.2 11B Vision11B FP8Falcon 11B11B FP8SOLAR 10.7B10.7B FP8GLM-4 9B9.4B FP8+114 more
Multi-GPU (88 models)
Gemma 2 27Bx2 FP8Gemma 3 27Bx2 FP8InternVL2 26Bx2 FP8Mistral Small 24Bx2 FP8Mistral Small 3.1 24Bx2 FP8Codestral 22Bx2 FP8Solar Pro 22Bx2 FP8GigaChat 20Bx2 FP8InternLM 20Bx2 FP8InternLM 2.5 20Bx2 FP8CogVLM2 19Bx2 FP8DeepSeek MoE 16Bx2 FP8CodeGen2 16Bx2 FP8DeepSeek V2 Litex2 FP8OctoCoder 15Bx2 FP8+73 more
Training Capabilities
Estimated GPU count for full fine-tuning (AdamW, BF16) and QLoRA
| Model Size | Full Fine-Tune | QLoRA |
|---|---|---|
| 7B model | 9 GPUs | 1 GPU |
| 13B model | 16 GPUs | 1 GPU |
| 70B model | 83 GPUs | 3 GPUs |
Energy Efficiency
Estimated tokens/second per Watt for popular models
Mistral 7B
0.44 t/s/WFP8
Qwen 2.5 7B
0.42 t/s/WFP8
Llama 3.1 8B
0.40 t/s/WFP8
Llama 3.1 70B
0.05 t/s/WFP8
Qwen 2.5 72B
0.04 t/s/WFP8