Use-case guide · Text-to-speech

Should you pick Gaudi 3 HL-325L for text-to-speech?

Gaudi 3 HL-325L has 128 GB VRAM. Whether it's the right fit for text-to-speech depends on your model size, expected QPS, and budget. Below is what we're seeing in production.

VRAM + model fit

Gaudi 3 HL-325L fits models up to ~90B parameters in BF16 comfortably with room for KV-cache. For text-to-speech specifically, you'll want to leave headroom for context length growth.

Pricing

Live pricing across all providers for Gaudi 3 HL-325L is on the GPU detail page — click through for the sortable list.

Throughput

On text-to-speech workloads, Gaudi 3 HL-325L typically delivers the throughput published in its FP16 spec, minus the framework overhead (vLLM ≈ 85% MFU, TGI ≈ 70%).

Try the calculator to size the hardware for your specific model, or see all GPUs on the InferenceScore leaderboard.