Riva HiFi-GAN (en-US)

Speech / TTS

NVIDIA · Riva TTS · v1.18 · released 2024-04-01

About

HiFi-GAN is the vocoder half of NVIDIA's en-US Riva TTS stack — converts mel-spectrograms produced by FastPitch into 22.05 kHz waveform audio. Known for high-fidelity sub-real-time synthesis on consumer GPUs.

Intended use: Vocoder paired with a mel-spectrogram acoustic model (FastPitch, Tacotron 2) to produce waveform speech. Suitable for streaming TTS at well under 1× real-time.

Architecture

Type: gan
Parameters: 14M

GAN-based neural vocoder. Generator is a fully-convolutional decoder that upsamples 80-band mel-spectrograms to 22.05 kHz waveform via transposed convolutions + multi-receptive-field fusion (MRF) modules. Trained adversarially against a multi-period + multi-scale discriminator. Pairs with FastPitch (mel acoustic model) to complete the en-US TTS pipeline.

Memory

Weights (BF16): 0.03 GB
Activation estimate: 0.04 GB

Pricing

Free — open weights

Self-host on your own GPU. The calculator surfaces GPU-hours cost on the hardware page instead of an API price.

Provenance

Source: catalog.ngc.nvidia.com
License: cc-by-4.0
Hugging Face: nvidia/tts_hifigan
Last verified: 2026-06-25

ttsvocoderganwaveform-synthesisopen-weightenglishreal-time