Riva HiFi-GAN (en-US)
Speech / TTSNVIDIA · Riva TTS · v1.18 · released
About
HiFi-GAN is the vocoder half of NVIDIA's en-US Riva TTS stack — converts mel-spectrograms produced by FastPitch into 22.05 kHz waveform audio. Known for high-fidelity sub-real-time synthesis on consumer GPUs.
Intended use: Vocoder paired with a mel-spectrogram acoustic model (FastPitch, Tacotron 2) to produce waveform speech. Suitable for streaming TTS at well under 1× real-time.
Architecture
- Type
- gan
- Parameters
- 14M
GAN-based neural vocoder. Generator is a fully-convolutional decoder that upsamples 80-band mel-spectrograms to 22.05 kHz waveform via transposed convolutions + multi-receptive-field fusion (MRF) modules. Trained adversarially against a multi-period + multi-scale discriminator. Pairs with FastPitch (mel acoustic model) to complete the en-US TTS pipeline.
Memory
- Weights (BF16)
- 0.03 GB
- Activation estimate
- 0.04 GB
Pricing
Free — open weights
Self-host on your own GPU. The calculator surfaces GPU-hours cost on the hardware page instead of an API price.
Provenance
- Source
- catalog.ngc.nvidia.com
- License
- cc-by-4.0
- Hugging Face
- nvidia/tts_hifigan
- Last verified
- 2026-06-25