Use-case guide · Summarisation
Should you pick RTX 3070 for summarisation?
RTX 3070 has 8 GB VRAM. Whether it's the right fit for summarisation depends on your model size, expected QPS, and budget. Below is what we're seeing in production.
VRAM + model fit
RTX 3070 fits models up to ~6B parameters in BF16 comfortably with room for KV-cache. For summarisation specifically, you'll want to leave headroom for context length growth.
Pricing
Live pricing across all providers for RTX 3070 is on the GPU detail page — click through for the sortable list.
Throughput
On summarisation workloads, RTX 3070 typically delivers the throughput published in its FP16 spec, minus the framework overhead (vLLM ≈ 85% MFU, TGI ≈ 70%).
Try the calculator to size the hardware for your specific model, or see all GPUs on the InferenceScore leaderboard.