Skip to content

Riva HiFi-GAN (en-US)

Speech / TTS

NVIDIA · Riva TTS · v1.18 · released

About

HiFi-GAN is the vocoder half of NVIDIA's en-US Riva TTS stack — converts mel-spectrograms produced by FastPitch into 22.05 kHz waveform audio. Known for high-fidelity sub-real-time synthesis on consumer GPUs.

Intended use: Vocoder paired with a mel-spectrogram acoustic model (FastPitch, Tacotron 2) to produce waveform speech. Suitable for streaming TTS at well under 1× real-time.

Architecture

Type
gan
Parameters
14M

GAN-based neural vocoder. Generator is a fully-convolutional decoder that upsamples 80-band mel-spectrograms to 22.05 kHz waveform via transposed convolutions + multi-receptive-field fusion (MRF) modules. Trained adversarially against a multi-period + multi-scale discriminator. Pairs with FastPitch (mel acoustic model) to complete the en-US TTS pipeline.

Memory

Weights (BF16)
0.03 GB
Activation estimate
0.04 GB

Pricing

Free — open weights

Self-host on your own GPU. The calculator surfaces GPU-hours cost on the hardware page instead of an API price.

Provenance

License
cc-by-4.0
Hugging Face
nvidia/tts_hifigan
Last verified
2026-06-25
ttsvocoderganwaveform-synthesisopen-weightenglishreal-time