top of page
FastSpeech Series by Microsoft
FastSpeech is a family of non-autoregressive text-to-speech models developed by Microsoft. The second iteration, FastSpeech 2, introduced enhancements in voice quality, training efficiency, and controllability.
Utilizes a feed-forward Transformer architecture for efficient parallel mel-spectrogram generation
Incorporates pitch, energy, and duration predictors to improve prosody and expressiveness
Achieves faster training and inference compared to autoregressive models
Supports multi-speaker and multilingual synthesis with appropriate training data
Open-source implementations available in PyTorch and TensorFlow
No Reviews YetShare your thoughts.
Be the first to leave a review.
bottom of page


