LAFMA

Official implementation of the paper "LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation" (INTERSPEECH 2024). Paper Link and Demo Page .

Checkpoints

VAEGAN Model: The VAEGAN model is the audio VAE that compresses the audio mel-spectrogram into an audio latent.

LAFMA Model: The LAFAM model is the latent flow matching model for text guided audio generation model.

We use the checkpoint of HiFi-GAN vocoder provided by AudioLDM .

Inference

# install dependicies
pip install -r requirement.txt

# infer
(first download the huggingface flan-t5-large to the huggingface/flan-t5-large dir)
(replace the checkpoint_path to yours in the .sh file)
cd LAFMA 
sh egs/tta/audiolfm/run_inference.sh

Acknowledgements

Cites

@misc{guan2024lafma,
      title={LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation}, 
      author={Wenhao Guan and Kaidi Wang and Wangjin Zhou and Yang Wang and Feng Deng and Hui Wang and Lin Li and Qingyang Hong and Yong Qin},
      year={2024},
      eprint={2406.08203},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bins/tta		bins/tta
config		config
egs/tta		egs/tta
imgs		imgs
models		models
modules		modules
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAFMA

Checkpoints

Inference

Acknowledgements

Cites

About

Releases

Packages

Languages

gwh22/LAFMA

Folders and files

Latest commit

History

Repository files navigation

LAFMA

Checkpoints

Inference

Acknowledgements

Cites

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages