Add files via upload

rkuo2000 · Dec 18, 2023 · 110e814 · 110e814
1 parent c02896d
commit 110e814
Showing 1 changed file with 127 additions and 0 deletions.
diff --git a/_posts/2023-12-08-Voice.md b/_posts/2023-12-08-Voice.md
@@ -0,0 +1,127 @@
+---
+layout: post
+title: AI Music & Voice
+author: [Richard Kuo]
+category: [Lecture]
+tags: [jekyll, ai]
+---
+
+This introduction includes Music Seperationm, Deep Singer, Voice Conversion, Voice Cloning, etc.
+
+---
+## Music Seperation
+### Spleeter
+**Paper:** [Spleeter: A FAST AND STATE-OF-THE ART MUSIC SOURCE
+SEPARATION TOOL WITH PRE-TRAINED MODELS](https://archives.ismir.net/ismir2019/latebreaking/000036.pdf)<br>
+**Code:** [deezer/spleeter](https://github.com/deezer/spleeter)<br>
+
+---
+### Wave-U-Net
+**Paper:** [Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation](https://arxiv.org/abs/1806.03185)<br>
+**Code:** [f90/Wave-U-Net](https://github.com/f90/Wave-U-Net)<br>
+
+![](https://github.com/f90/Wave-U-Net/blob/master/waveunet.png?raw=true)
+
+---
+### Hyper Wave-U-Net
+**Paper:** [Improving singing voice separation with the Wave-U-Net using Minimum Hyperspherical Energy](https://arxiv.org/abs/1910.10071)<br>
+**Code:** [jperezlapillo/hyper-wave-u-net](https://github.com/jperezlapillo/hyper-wave-u-net)<br>
+**MHE regularisation:**<br>
+![](https://github.com/jperezlapillo/Hyper-Wave-U-Net/blob/master/diagram_v2.JPG?raw=true)
+
+---
+### Demucs
+**Paper:** [Music Source Separation in the Waveform Domain](https://arxiv.org/abs/1911.13254)<br>
+**Code:** [facebookresearch/demucs](https://github.com/facebookresearch/demucs)<br>
+
+![](https://github.com/facebookresearch/demucs/blob/main/demucs.png?raw=true)
+
+---
+## Deep Singer
+
+### [OpenAI Jukebox](https://jukebox.openai.com/)
+**Blog:** [Jukebox](https://openai.com/blog/jukebox/)<br>
+model modified from **VQ-VAE-2**
+**Paper:** [Jukebox: A Generative Model for Music](https://arxiv.org/abs/2005.00341)<br>
+**Colab:** [Interacting with Jukebox](https://colab.research.google.com/github/openai/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb)<br>
+
+---
+### DeepSinger
+**Blog:** [Microsoft’s AI generates voices that sing in Chinese and English](https://venturebeat.com/2020/07/13/microsofts-ai-generates-voices-that-sing-in-chinese-and-english/)<br>
+**Paper:** [DeepSinger: Singing Voice Synthesis with Data Mined From the Web](https://arxiv.org/abs/2007.04590)<br>
+**Demo:** [DeepSinger: Singing Voice Synthesis with Data Mined From the Web](https://speechresearch.github.io/deepsinger/)<br>
+
+![](https://lfs.aminer.cn/upload/pdf_image/5ecf/e0e/5ecfae0e9e795eb20a615049img-002.png)
+<p align="center">The alignment model based on the architecture of automatic speech recognition</p>
+
+![](https://lfs.aminer.cn/upload/pdf_image/5ecf/e0e/5ecfae0e9e795eb20a615049img-004.png)
+<p align="center">The architecture of the singing model</p>
+
+![](https://lfs.aminer.cn/upload/pdf_image/5ecf/e0e/5ecfae0e9e795eb20a615049img-005.png)
+<p align="center">The inference process of singing voice synthesis</p>
+
+---
+## Voice Conversion
+**Paper:** [An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning](https://arxiv.org/abs/2008.03648)<br>
+
+![](https://d3i71xaburhd42.cloudfront.net/4e1f36855442b761729dad4507513e23ca66206c/7-Figure2-1.png)
+
+**Blog:** [Voice Cloning Using Deep Learning](https://medium.com/the-research-nest/voice-cloning-using-deep-learning-166f1b8d8595)<br>
+
+---
+### Deep Voice 3
+**Blog:** [Deep Voice 3: Scaling Text to Speech with Convolutional Sequence Learning](https://medium.com/a-paper-a-day-will-have-you-screaming-hurray/day-6-deep-voice-3-scaling-text-to-speech-with-convolutional-sequence-learning-16c3e8be4eda)<br>
+**Paper:** [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)<br>
+**Code:** [r9y9/deepvoice3_pytorch](https://github.com/r9y9/deepvoice3_pytorch)<br>
+**Code:** [Kyubyong/deepvoice3](https://github.com/Kyubyong/deepvoice3)<br>
+
+![](https://miro.medium.com/max/700/1*06JbKxq2eS9G8yO-fhELWg.png)
+
+---
+### Neural Voice Cloning
+**Paper:** [Neural Voice Cloning with a Few Samples](https://arxiv.org/abs/1802.06006)<br>
+**Code:** [SforAiDl/Neural-Voice-Cloning-With-Few-Samples](https://github.com/SforAiDl/Neural-Voice-Cloning-With-Few-Samples)<br>
+
+![](https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/cb3e87412d2fa52441e40bff3db2135dec9de3b9/4-Figure1-1.png)
+
+---
+### SV2TTS
+**Blog:** [Voice Cloning: Corentin's Improvisation On SV2TTS](https://www.datasciencecentral.com/profiles/blogs/voice-cloning-corentin-s-improvisation-on-sv2tts)<br>
+**Paper:** [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/abs/1806.04558)<br>
+**Code:** [CorentinJ/Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)<br>
+
+![](https://storage.ning.com/topology/rest/1.0/file/get/8569870256)
+
+**Synthesizer** : The synthesizer is Tacotron2 without Wavenet<br>
+![](https://storage.ning.com/topology/rest/1.0/file/get/8569881687)
+
+**SV2TTS Toolbox**<br>
+<iframe width="1148" height="646" src="https://www.youtube.com/embed/-O_hYhToKoA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+---
+### MelGAN-VC
+**Paper:** [MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms](https://arxiv.org/abs/1910.03713)<br>
+**Code:** [marcoppasini/MelGAN-VC](https://github.com/marcoppasini/MelGAN-VC)<br>
+
+![](https://d3i71xaburhd42.cloudfront.net/498cdaa589a17bf9a28f85005617088f39685fc2/2-Figure1-1.png)
+
+---
+### Vocoder-free End-to-End Voice Conversion
+**Paper:** [Vocoder-free End-to-End Voice Conversion with Transformer Network](https://arxiv.org/abs/2002.03808)<br>
+**Code:** [kaen2891/kaen2891.github.io](https://github.com/kaen2891/kaen2891.github.io)<br>
+
+![](https://d3i71xaburhd42.cloudfront.net/f0d09082d4ea2b201c992c93eec1d32e7ff166ea/3-Figure1-1.png)
+
+---
+### ConVoice
+**Paper:** [ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network](https://arxiv.org/abs/2005.07815)<br>
+**Demo:** [ConVoice: Real-Time Zero-Shot Voice Style Transfer](https://rebryk.github.io/convoice-demo/)<br>
+
+![](https://d3i71xaburhd42.cloudfront.net/c142f05e1577048f712eb3a240baab50ea862301/2-Figure1-1.png)
+
+<br>
+<br>
+
+*This site was last updated {{ site.time | date: "%B %d, %Y" }}.*
+
+