This repository provides UNOFFICIAL CycleVAE-VC implementations with Pytorch.
You can combine your own vocoder to get great converted speech!!
Source of the figure: https://arxiv.org/pdf/1907.10185.pdf
The goal of this repository is to provide VC model trained with completely non-parallel data. Also this repository is to provide many-to-many conversion model.
I modified the model from @patrickltobing 's implementation as below. In the original model, AR structure is used for ConvRnn network. However, it takes quite a long time to train with that model. So I used RNN-based model to train faster.
- 2020/06/11 [NEW!] Support ParallelWaveGAN in vocoder branch.
- 2020/06/02 Support one-to-one conversion model.
This repository is tested on Ubuntu 19.10 with a RTX2080ti with the following environment.
- Python 3.7+
- Cuda10.2
- CuDNN 7+
You can setup this repository with the following commands.
$ cd tools
$ make
Please check if the venv
directory is successfully located under the tools directory.
Before training the model, be sure to locate your wav files under specific directory. I assume that the structure of the wav directory is:
wav
├── train
│ ├── jvs001
│ └── jvs002
└── val
├── jvs001
└── jvs002
-
This script is not designed for servers, which uses
slurm
. -
If you are using
slurm
or you have some GPUs, then you have to add environment variables inpath.sh
-
To set environment variables and activate virtual environment, run
. path.sh
-
Run the next command to generate figures
. run.sh --stage 0
and the figures will generated into
./figure
directory. -
If you don't have speaker config file in
./config/speaker
, then you have to do the following-
Copy
./config/speaker/default.conf
to./config/speaker/<spk_name>.conf
-
Set speaker-dependent variables there.
The structure of the config file is:
<minf0> <maxf0> <npow>
-
-
Run the next command to extract features and train the model.
. run.sh --stage 12
-
stage1: Feature Extract
-
Stage2: Training
Flags in training stage
- conf_path : Path to the training config file. Default:
./config/vc.conf
- model_name : Name of the saved model. Model name will be
<model_name>.<num_iter>.pt
. - log_name : Logging directory to save events files from tensorboard
-
-
Run the next command to convert voice.
. run.sh --stage 3
Flags in conversion stage
- test_dir : Directory to save source wav files.
- exp_dir : Directory to save converted wav files.
- checkpoint : Path to the trained model.
- log_name : Name of the log file.
-
training steps
-
sounds
-
demo wav files are acquired from https://voice-statistics.github.io/
-
You can find converted wav files in
./for_readme/wav
-
- Support gin-config
The author would like to thank Patrick Lumban Tobing for his repository.
Someki Masao (@Masao-Someki)
e-mail : [email protected]