Welcome to Persian-TTS-Zoo, a collection of my Persian Text-to-Speech models adapted from popular architectures to work with the Persian language. This repository provides links to individual TTS models, each with unique features and capabilities, all developed and trained by me.
- Persian-Tacotron2
- Persian-FastSpeech2
- Persian-FastPitch
- Persian-MultiSpeaker-Tacotron2
- Persian-AdaSpeech2
- Persian-GST-Tacotron
- Persian-HiFiGAN
- License
- Description: Tacotron2 is an end-to-end TTS model that synthesizes natural-sounding speech from text. It has been adapted for Persian, maintaining the smooth, natural prosody of the original while handling unique Persian phonemes.
- Features: Provides single-speaker synthesis with high-quality and expressive speech, trained on Persian datasets for natural intonation.
- Repository: Persian-Tacotron2 Repository
- Description: An extension of Tacotron2 for multi-speaker TTS, trained with a diverse Persian speaker dataset, allowing the model to generate voices with varying speaker characteristics.
- Features: Multi-speaker synthesis with speaker embedding support for customizable voice generation.
- Repository: Persian-MultiSpeaker-Tacotron2 Repository
- Description: FastSpeech2 improves upon Tacotron by generating speech faster and more robustly. This model has been adapted for Persian to produce high-quality TTS output with faster inference times.
- Features: Faster, stable synthesis suitable for real-time applications, with support for multi-speaker and single-speaker configurations.
- Repository: Persian-FastSpeech2 Repository
- Description: FastPitch is a pitch-controllable variant of FastSpeech. This version is adapted for Persian, allowing precise control over intonation, pitch, and speaking style.
- Features: Pitch control, improved stability, and high-quality synthesis with efficient inference.
- Repository: Persian-FastPitch Repository
- Description: AdaSpeech2 is an adaptive TTS model that customizes the speech synthesis for different speaker styles and prosody. This version has been tailored for Persian language, focusing on adaptability and nuanced speech generation.
- Features: Flexible voice style adaptation, suited for generating unique voices or imitating particular speech patterns.
- Repository: Persian-AdaSpeech Repository
- Description: GST-Tacotron is a Tacotron variant with Global Style Tokens, allowing style control in speech generation. The Persian-GST-Tacotron enables different speaking styles within Persian, such as formal and casual tones.
- Features: Style control, expressive speech generation, supports single and multi-speaker synthesis.
- Repository: Persian-GST-Tacotron Repository
- Description: HiFiGAN is a GAN-based vocoder for high-quality waveform generation from mel-spectrograms. The Persian-HiFiGAN version is trained to convert Persian TTS spectrograms into realistic audio.
- Features: High-quality, natural-sounding audio output from spectrograms; supports single and multi-speaker TTS.
- Repositories:
This repository links to various open-source projects; check individual repositories for their respective licenses.