Skip to content

Riva FastPitch (en-US)

Speech / TTS

NVIDIA · Riva TTS · v1.18 · released

About

FastPitch is a parallel transformer-based mel-spectrogram generator that explicitly controls pitch and duration of speech. Packaged in NVIDIA NeMo / Riva as the front-end of the en-US neural TTS pipeline; the back-end vocoder is HiFi-GAN. Trained on the LJSpeech corpus.

Intended use: Real-time text-to-speech for conversational AI, accessibility, voice interfaces. Pair with HiFi-GAN for full waveform synthesis.

Architecture

Type
encoder-decoder
Parameters
45M
Layers
6
Hidden dim
384

Mel-spectrogram acoustic model. Transformer text encoder + duration/pitch predictors + transformer decoder predicting 80-band mel-spectrogram frames. Designed to pair with a separate vocoder (HiFi-GAN) that converts mel-spectrograms to waveform. Non-autoregressive — predicts all frames in parallel for sub-real-time inference.

Memory

Weights (BF16)
0.09 GB
Activation estimate
0.05 GB

Pricing

Free — open weights

Self-host on your own GPU. The calculator surfaces GPU-hours cost on the hardware page instead of an API price.

Provenance

License
cc-by-4.0
Last verified
2026-06-25
ttsspeech-synthesismel-spectrogramopen-weightenglishreal-time