Updated minutes ago
RTX 4070 Ti
nvidia · ada · 12 GB GDDR6X · 285W TDP
VRAM
12 GB
BF16 TFLOPS
93
Bandwidth
504 GB/s
From
$0.25/hr
Spec Sheet
VRAM12 GB GDDR6X
Memory Bandwidth504 GB/s
BF16 TFLOPS93
FP16 TFLOPS93
FP8 TFLOPS186
INT8 TOPS186
TDP285W
InterconnectPCIE
Max per Node4
PCIe Gen4
CUDA Compute Capability8.9
Tensor CoresYes
Pricing by Provider
| Provider | On-Demand | Reserved | Spot | Badge |
|---|---|---|---|---|
| tensordock | $0.39/hr | - | $0.25/hr | Cheapest |
| vast_ai | $0.45/hr | - | $0.28/hr |
Compatible Models (205)
Single GPU (115 models)
GLM-4 9B9.4B FP8ChatGLM4 9B9.4B FP8Gemma 2 9B9.2B FP8Yi 1.5 9B8.83B FP8Yi Coder 9B8.8B FP8CodeGemma 7B8.5B FP8Qwen 3 8B8.2B FP8Llama 3.1 8B8.03B FP8Hermes 3 8B8.03B FP8Aya 23 8B8B FP8DeepSeek R1 Distill 8B8B FP8Llama 3 8B8B FP8Llama Guard 3 8B8B FP8Ministral 8B8B FP8Minitron 8B8B FP8Llama 3.3 8B8B FP8NV Embed v27.85B FP8InternLM 2.5 7B7.74B FP8GTE Qwen2 7B7.6B FP8Marco O17.6B FP8+95 more
Multi-GPU (90 models)
GigaChat 20Bx2 FP8InternLM 20Bx2 FP8InternLM 2.5 20Bx2 FP8CogVLM2 19Bx2 FP8DeepSeek MoE 16Bx2 FP8CodeGen2 16Bx2 FP8DeepSeek V2 Litex2 FP8OctoCoder 15Bx2 FP8StarCoder2 15Bx2 FP8Nemotron 15Bx2 FP8Qwen 2.5 14Bx2 FP8DeepSeek R1 Distill 14Bx2 FP8Phi-4x2 FP8Qwen 2.5 Coder 14Bx2 FP8Qwen 1.5 MoE A2.7Bx2 FP8+75 more
Training Capabilities
Estimated GPU count for full fine-tuning (AdamW, BF16) and QLoRA
| Model Size | Full Fine-Tune | QLoRA |
|---|---|---|
| 7B model | 11 GPUs | 1 GPU |
| 13B model | 21 GPUs | 1 GPU |
| 70B model | 110 GPUs | 4 GPUs |
Energy Efficiency
Estimated tokens/second per Watt for popular models
Mistral 7B
0.24 t/s/WFP8
Qwen 2.5 7B
0.23 t/s/WFP8
Llama 3.1 8B
0.22 t/s/WFP8
Similar GPUs
| GPU | VRAM | BF16 TFLOPS | BW (GB/s) | From |
|---|---|---|---|---|
| RTX 4070 Super | 12 GB | 55 | 504 | $0.22/hr |
| RTX 4080 | 16 GB | 97 | 717 | $0.32/hr |
| RTX 4060 Ti 16GB | 16 GB | 44 | 288 | $0.30/hr |
| RTX 4060 | 8 GB | 30 | 272 | $0.22/hr |
| L4 | 24 GB | 121 | 300 | $0.29/hr |