Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

This is the official code repository for the paper "Neural HMMs are all you need (for high-quality attention-free TTS)". For audio examples, visit our demo page. pre-trained model (female) and pre-trained model (male) are also available.

Setup and training using LJ Speech

Download and extract the LJ Speech dataset. Place it in the data folder such that the directory becomes data/LJSpeech-1.1. Otherwise update the filelists in data/filelists accordingly.
Clone this repository git clone https://github.com/shivammehta25/Neural-HMM.git
- If using single GPU checkout the branch gradient_checkpointing it will help to fit bigger batch size during training.
- Use git clone --single-branch -b gradient_checkpointing https://github.com/shivammehta25/Neural-HMM.git for that.
Initalise the submodules git submodule init; git submodule update
Make sure you have docker installed and running.
- It is recommended to use Docker (it manages the CUDA runtime libraries and Python dependencies itself specified in Dockerfile)
- Alternatively, If you do not intend to use Docker, you can use pip to install the dependencies using pip install -r requirements.txt
Run bash start.sh and it will install all the dependencies and run the container.
Check src/hparams.py for hyperparameters and set GPUs.
1. For multi-GPU training, set GPUs to [0, 1 ..]
2. For CPU training (not recommended), set GPUs to an empty list []
3. Check the location of transcriptions
Once your filelists and hparams are updated run python generate_data_properties.py to generate data_parameters.pt for your dataset (the default data_parameters.pt is available for LJSpeech in the repository).
Run python train.py to train the model.
1. Checkpoints will be saved in the hparams.checkpoint_dir.
2. Tensorboard logs will be saved in the hparams.tensorboard_log_dir.
To resume training, run python train.py -c <CHECKPOINT_PATH>

Synthesis

Download our pre-trained LJ Speech model. (This is the exact same model as system NH2 in the paper, but with training continued until reaching 200k updates total.)
- Alternatively, you can also use a pre-trained RyanSpeech model (trained for 150k updates).
Download HiFi gan pretrained HiFiGAN model.
- We recommend using fine tuned on Tacotron2 if you cannot finetune on NeuralHMM.
Run jupyter notebook and open synthesis.ipynb.

Miscellaneous

Mixed-precision training or full-precision training

In src.hparams.py change hparams.precision to 16 for mixed precision and 32 for full precision.

Multi-GPU training or single-GPU training

Since the code uses PyTorch Lightning, providing more than one element in the list of GPUs will enable multi-GPU training. So change hparams.gpus to [0, 1, 2] for multi-GPU training and single element [0] for single-GPU training.

Known issues/warnings

PyTorch dataloader

If you encounter this error message [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool), this is a known issue in PyTorch Dataloader.
It will be fixed when PyTorch releases a new Docker container image with updated version of Torch. If you are not using docker this can be removed with torch > 1.9.1

Torchmetric error on RTX 3090

If you encoder this error message ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/data.py)
Update the requirement.txt file with these requirements:

torch==1.11.0a0+b6df043
--extra-index-url https://download.pytorch.org/whl/cu113
torchmetrics==0.6.0

Support

If you have any questions or comments, please open an issue on our GitHub repository.

Citation information

If you use or build on our method or code for your research, please cite our paper:

@inproceedings{mehta2022neural,
  title={Neural {HMM}s are all you need (for high-quality attention-free {TTS})},
  author={Mehta, Shivam and Sz{\'e}kely, {\'E}va and Beskow, Jonas and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2022}
}

Acknowledgements

The code implementation is based on Nvidia's implementation of Tacotron 2 and uses PyTorch Lightning for boilerplate-free code.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
data		data
deployment		deployment
docs		docs
hifigan @ 4769534		hifigan @ 4769534
src		src
tests		tests
.deepsource.toml		.deepsource.toml
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
NeuralHMMTTS.png		NeuralHMMTTS.png
README.md		README.md
bash.bashrc		bash.bashrc
data_parameters.pt		data_parameters.pt
generate_data_properties.py		generate_data_properties.py
hifigandenoiser.py		hifigandenoiser.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_gh_action.txt		requirements_gh_action.txt
setup.cfg		setup.cfg
start.sh		start.sh
synthesis.ipynb		synthesis.ipynb
synthesis_waveglow_old.ipynb		synthesis_waveglow_old.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

Setup and training using LJ Speech

Synthesis

Miscellaneous

Mixed-precision training or full-precision training

Multi-GPU training or single-GPU training

Known issues/warnings

PyTorch dataloader

Torchmetric error on RTX 3090

Support

Citation information

Acknowledgements

About

Releases 1

Packages

Contributors 5

Languages

License

shivammehta25/Neural-HMM

Folders and files

Latest commit

History

Repository files navigation

Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

Setup and training using LJ Speech

Synthesis

Miscellaneous

Mixed-precision training or full-precision training

Multi-GPU training or single-GPU training

Known issues/warnings

PyTorch dataloader

Torchmetric error on RTX 3090

Support

Citation information

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages