We provides a beginner recipe to demonstrate how to implement interactive visualization for classic audio, music and speech generative models. Specifically, it is also an official implementation of the paper "SingVisio: Visual Analytics of the Diffusion Model for Singing Voice Conversion", which can be accessed via arXiv or Computers & Graphics. The SingVisio can be experienced here.
As the unique feature of Amphion, visualization aims to introduce interactive visual analysis of some classical models for educational purposes, helping newcomers understand their inner workings.
Until now, Amphion has supported the visualization tool for the following models:
- SVC:
- MultipleContentsSVC: A diffusion-based model for sining voice conversion
- TTS:
- FastSpeech 2 (👨💻 developing): A typical transformer-based TTS model.
- VITS (👨💻 developing): A typical flow-based end-to-end TTS model.