Skip to content

Commit

Permalink
Dev no emo (yl4579#123)
Browse files Browse the repository at this point in the history
* Create emo_gen.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update server.py, fix bugs in func get_text() and infer(). (yl4579#52)

* Extract get_text() and infer() from webui.py. (yl4579#53)

* Extract get_text() and infer() from webui.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add emo emb

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* init emo gen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* init emo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* init emo

* Delete bert/bert-base-japanese-v3 directory

* Create .gitkeep

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Create add_punc.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug in bert_gen.py (yl4579#54)

* Update README.md

* fix bug in models.py (yl4579#56)

* 更新 models.py

* Fix japanese cleaner (yl4579#61)

* 初步,睡觉明天继续写(

* 好好好放错分支了,熬夜是大忌

* [pre-commit.ci] pre-commit autoupdate (yl4579#55)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](pre-commit/pre-commit-hooks@v4.4.0...v4.5.0)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Create tokenizer_config.json

* update preprocess_text.py:过滤一个音频匹配多个文本的情况 (yl4579#57)

* update preprocess_text.py:过滤音频不存在的情况 (yl4579#58)

* 修复日语cleaner和bert

* better

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Stardust·减 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sora <[email protected]>

* Apply Code Formatter Change

* Add config.yml for global configuration. (yl4579#62)

* Add config.yml for global configuration.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug in webui.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename config.yml to default_config.yml. Add ./config.yml to gitignore.

* Add config.py to parse config.yml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update webui.py (yl4579#65)

* Update webui.py:
1. Add auto translation from Chinese to Japanese.
2. Start to use config.py in webui.py to set config instead of using the command line.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix (yl4579#68)

* 加上ー

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update infer.py and webui.py.  Supports loading and inference models of 1.1.1 version. (yl4579#66)

* Update infer.py and webui.py. Supports loading and inference models of 1.1.1 version.

* Update config.json

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix bug in translate.py (yl4579#69)

* Supports loading and inference models of 1.1、1.0.1、1.0 version. (yl4579#70)

* Supports loading and inference models of 1.1、1.0.1、1.0 version.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete useless file in OldVersion

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update japanese.py (yl4579#71)

Handling JA long pronunciations

* 使用配置文件配置bert_gen.py, preprocess_text.py, resample.py (yl4579#72)

* Update bert_gen.py, preprocess_text.py, resample.py. Support using config.yml in these files.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update bert_gen.py

* Update bert_gen.py, fix bug.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Delete bert/bert-base-japanese-v3 directory

* Create config.json

* Create tokenizer_config.json

* Create vocab.txt

* Update server.py. 支持多版本多模型 (yl4579#76)

* Update server.py. 支持多版本多模型

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Dev webui (yl4579#77)

* 申请pr (yl4579#75)

* 2023/10/11 update

界面优化

* Update webui.py

翻译英文页面为中文

* Update train_ms.py

单卡训练

* 加入图片

* Update extern_subprocess.py

* Update asr_transcript.py

* Update asr_transcript.py

* Update asr_transcript.py

* Update extern_subprocess.py

* Update asr_transcript.py

* Update asr_transcript.py

* Update asr_transcript.py

* Update all_process.py

* Update extern_subprocess.py

* Update all_process.py

* Update all_process.py

* Update asr_transcript.py

* Update extern_subprocess.py

* Update webui.py

* Create re_matching.py

* Update webui.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update all_process.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update all_process.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update all_process.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update asr_transcript.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Pack 'update' functions into a module

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update all_process.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update asr_transcript.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update extern_subprocess.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update all_process.py

* Update asr_transcript.py

* Update webui.py

* Add files via upload

* Update extern_subprocess.py

* Update all_process.py

* Update asr_transcript.py

* Update bert_gen.py

* Update extern_subprocess.py

* Update preprocess_text.py

* Update re_matching.py

* Update resample.py

* Update update_status.py

* Update update_status.py

* Update webui.py

* Update all_process.py

* Update preprocess_text.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update train_ms.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Stardust·减 <[email protected]>
Co-authored-by: innnky <[email protected]>

* Delete all_process.py

* Delete asr_transcript.py

* Delete extern_subprocess.py

---------

Co-authored-by: spicysama <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: innnky <[email protected]>

* Create config.json

* Create  preprocessor_config.json

* Create vocab.json

* Delete emotional/wav2vec2-large-robust-12-ft-emotion-msp-dim/.gitkeep

* Update emo_gen.py

* Delete add_punc.py

* add emotion_clustering.i

* Apply Code Formatter Change

* Update models.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update preprocess_text.py (yl4579#78)

* Update preprocess_text.py. 检测重复以及不存在的音频 (yl4579#79)

* Handle Janpanese long pronunciations (yl4579#80)

* Handle Janpanese long pronunciations

* Update japanese.py

* Update japanese.py

* Use unified phonemes for Japanese long vowel (yl4579#82)

* Use an unified phoneme for Japanese long vowel

`symbol.py` has not been updated to ensure compatibility with older version models.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* 增加一个按钮,点击后可以按句子切分,添加“|” (yl4579#81)

* Update re_matching.py

* Update webui.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix phonemer bug (yl4579#83)

* Fix phonemer bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix long vowel handler bug (yl4579#84)

* Fix long vowel handler bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* 加入整合包管理器的特性:长文本合成可以自定义句间段间停顿 (yl4579#85)

* Update webui.py

* Update re_matching.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update train_ms.py

* fix'

* Update cleaner.py

* add en

* add en

* Update english.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add en

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add en

* add en

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add en

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 更新 README.md

* 更新 README.md

* 更新 README.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change phonemer to pyopenjtalk (yl4579#86)

* Change phonemer to pyopenjtalk

* 修改为openjtalk便于安装

---------

Co-authored-by: Stardust·减 <[email protected]>

* 更新 english.py

* Fix english_bert_mock.py. (yl4579#87)

* Add punctuation execptions (yl4579#88)

* Add punctuation execptions

* Ellipses exceptions

* remove get bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug in oldVersion. (yl4579#89)

* Update requirements.txt

* change to large

* rollback requirements.txt

* Feat: Enable 1.1.1 models using fix-ver infer. (yl4579#91)

* Feat: Enable 1.1.1 models using fix-ver infer.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add Japanese accent (high-low) (yl4579#90)

* Add punctuation execptions

* Ellipses exceptions

* Add Japanese accent

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Do not replace iteration mark (yl4579#92)

* Add punctuation execptions

* Ellipses exceptions

* Add Japanese accent

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Do not replace iteration mark

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix: fix import error in oldVersion (yl4579#93)

* Refactor: reusing model loading in webui.py and server.py. (yl4579#94)

* Feat: Enable using config.yml in train_ms.py (yl4579#96)

* 更新 emo_gen.py

* Change emo_gen.py (yl4579#97)

* Fix emo_gen bugs

* Add multiprocess

* Fix queue (yl4579#98)

* Fix emo_gen bugs

* Add multiprocess

* Del var

* Fix queue

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix training bugs (yl4579#99)

* Updatge cluster notebook

* Fix train

* Fix filename

* Update infer.py (yl4579#100)

* Update infer.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add reference audio (yl4579#101)

* Add reference audio

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update

* Update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Stardust·减 <[email protected]>

* Fix: fix 1.1.1-fix (yl4579#102)

* Fix infer bug (yl4579#103)

* Feat: Add server_fastapi.py. (yl4579#104)

* Feat: Add server_fastapi.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: Update requirements.txt.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix: requirements.txt. (yl4579#105)

* Swith to deberta-v3-large (yl4579#106)

* Swith to deberta-v3-large

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Feat: Update config.py. (yl4579#107)

* Feat: Update config.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Dev fix (yl4579#108)

* fix bugs when deploying

* fix bugs when deploying

* fix bugs when deploying

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Dev fix (yl4579#108)" (yl4579#109)

This reverts commit 685e18a10498d602b1a9a26079340d11925646f0.

* Dev fix (yl4579#110)

* fix bugs when deploying

* fix bugs when deploying

* fix bugs when deploying

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix fixed bugs

* fix fixed bugs

* fix fixed bug 3

* fix fixed bug 4

* fix fixed bug 5

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add emo vec quantizer (yl4579#111)

Co-authored-by: Stardust·减 <[email protected]>

* Clean req and gitignore (yl4579#112)

* Clean req and gitignore

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch to deberta-v2-large-japanese (yl4579#113)

* Switch to deberta-v2-large-japanese

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix emo bugs (yl4579#114)

* Fix english (yl4579#115)

* Remove emo (yl4579#117)

* Don't train codebook

* Remove emo

* Update

* Update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Merge dev into no-emo (yl4579#122)

* [pre-commit.ci] pre-commit autoupdate (yl4579#95)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.0.292 → v0.1.1](astral-sh/ruff-pre-commit@v0.0.292...v0.1.1)
- [github.com/psf/black: 23.9.1 → 23.10.0](psf/black@23.9.1...23.10.0)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Don't train codebook (yl4579#116)

* Update requirements.txt

* Update english_bert_mock.py

* Fix: server_fastapi.py (yl4579#118)

* Fix: server_fastapi.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix: don't print debug logging. (yl4579#119)

* Fix: don't print debug logging.

* Feat: support emo_gen config

* Fix config

* Apply Code Formatter Change

* 更新,修正bug (yl4579#121)

* Feat: Update infer.py preprocess_text.py server_fastapi.py.

* Fix resample.py. Maintain same directory structure in out_dir as in_dir.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update resample.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update server_fastapi.py to no-emo ver

* Update config.py, no emo config

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: OedoSoldier <[email protected]>
Co-authored-by: Stardust·减 <[email protected]>
Co-authored-by: Stardust-minus <[email protected]>

* Update train_ms.py

* Update latest version info (yl4579#124)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jiangyuxiaoxiao <[email protected]>
Co-authored-by: AkitoLiu <[email protected]>
Co-authored-by: Stardust-minus <[email protected]>
Co-authored-by: OedoSoldier <[email protected]>
Co-authored-by: spicysama <[email protected]>
Co-authored-by: innnky <[email protected]>
Co-authored-by: YYuX-1145 <[email protected]>
  • Loading branch information
9 people authored Oct 28, 2023
1 parent c1ba4c7 commit 82ae8f6
Show file tree
Hide file tree
Showing 102 changed files with 175,570 additions and 1,058 deletions.
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,15 @@ cython_debug/
filelists/*
!/filelists/esd.list
data/*
/config.yml
/Web/
/emotional/*/*.bin
/bert/*/*.bin
/bert/*/*.h5
/bert/*/*.model
/bert/*/*.safetensors
/bert/*/*.msgpack
asr_transcript.py
extract_list.py
/Data
Data/*
Empty file added .gitmodules
Empty file.
12 changes: 3 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,8 @@ VITS2 Backbone with bert
[//]: # ()
[//]: # (本仓库来源于之前朋友分享了ai峰哥的视频,本人被其中的效果惊艳,在自己尝试MassTTS以后发现fs在音质方面与vits有一定差距,并且training的pipeline比vits更复杂,因此按照其思路将bert)

[//]: # (与vits结合起来以获得更好的韵律。本身我们是出于兴趣玩开源项目,用爱发电,我们本无意与任何人起冲突,然而[MaxMax2016]&#40;https://github.com/MaxMax2016&#41;)

[//]: # (以及其organization[PlayVoice]&#40;https://github.com/PlayVoice&#41;几次三番前来碰瓷,说本项目抄袭了他们的代码,甚至上法院云云,因此在Readme中特别声明,本项目与)

[//]: # ([PlayVoice/vits_chinese]&#40;https://github.com/PlayVoice/vits_chinese&#41;没有任何关系,结合bert的思路方面也是完全来源于MassTTS)


[//]: # (附:对面认为本项目抄袭了他代码的证据,诸位可以自行查看并做出判断,[bert_vits2引用的MassTTS的实际代码]&#40;https://github.com/PlayVoice/vits_chinese/tree/4781241520c6b9fdcf090fca289148719272e89f#bert_vits2%E5%BC%95%E7%94%A8%E7%9A%84masstts%E7%9A%84%E5%AE%9E%E9%99%85%E4%BB%A3%E7%A0%81&#41; )

## 成熟的旅行者/开拓者/舰长/博士/sensei/猎魔人/喵喵露/V应当参阅代码自己学习如何训练。

### 严禁将此项目用于一切违反《中华人民共和国宪法》,《中华人民共和国刑法》,《中华人民共和国治安管理处罚法》和《中华人民共和国民法典》之用途。
### 严禁用于任何政治相关用途。
#### Video:https://www.bilibili.com/video/BV1hp4y1K78E
Expand All @@ -30,6 +22,8 @@ VITS2 Backbone with bert
+ [p0p4k/vits2_pytorch](https://github.com/p0p4k/vits2_pytorch)
+ [svc-develop-team/so-vits-svc](https://github.com/svc-develop-team/so-vits-svc)
+ [PaddlePaddle/PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)
+ [emotional-vits](https://github.com/innnky/emotional-vits)
+ [Bert-VITS2-en](https://github.com/xwan07017/Bert-VITS2-en)
## 感谢所有贡献者作出的努力
<a href="https://github.com/fishaudio/Bert-VITS2/graphs/contributors" target="_blank">
<img src="https://contrib.rocks/image?repo=fishaudio/Bert-VITS2"/>
Expand Down
34 changes: 34 additions & 0 deletions bert/bert-base-japanese-v3/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
34 changes: 34 additions & 0 deletions bert/bert-large-japanese-v2/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
53 changes: 53 additions & 0 deletions bert/bert-large-japanese-v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
license: apache-2.0
datasets:
- cc100
- wikipedia
language:
- ja
widget:
- text: 東北大学で[MASK]の研究をしています。
---

# BERT large Japanese (unidic-lite with whole word masking, CC-100 and jawiki-20230102)

This is a [BERT](https://github.com/google-research/bert) model pretrained on texts in the Japanese language.

This version of the model processes input texts with word-level tokenization based on the Unidic 2.1.2 dictionary (available in [unidic-lite](https://pypi.org/project/unidic-lite/) package), followed by the WordPiece subword tokenization.
Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective.

The codes for the pretraining are available at [cl-tohoku/bert-japanese](https://github.com/cl-tohoku/bert-japanese/).

## Model architecture

The model architecture is the same as the original BERT large model; 24 layers, 1024 dimensions of hidden states, and 16 attention heads.

## Training Data

The model is trained on the Japanese portion of [CC-100 dataset](https://data.statmt.org/cc-100/) and the Japanese version of Wikipedia.
For Wikipedia, we generated a text corpus from the [Wikipedia Cirrussearch dump file](https://dumps.wikimedia.org/other/cirrussearch/) as of January 2, 2023.
The corpus files generated from CC-100 and Wikipedia are 74.3GB and 4.9GB in size and consist of approximately 392M and 34M sentences, respectively.

For the purpose of splitting texts into sentences, we used [fugashi](https://github.com/polm/fugashi) with [mecab-ipadic-NEologd](https://github.com/neologd/mecab-ipadic-neologd) dictionary (v0.0.7).

## Tokenization

The texts are first tokenized by MeCab with the Unidic 2.1.2 dictionary and then split into subwords by the WordPiece algorithm.
The vocabulary size is 32768.

We used [fugashi](https://github.com/polm/fugashi) and [unidic-lite](https://github.com/polm/unidic-lite) packages for the tokenization.

## Training

We trained the model first on the CC-100 corpus for 1M steps and then on the Wikipedia corpus for another 1M steps.
For training of the MLM (masked language modeling) objective, we introduced whole word masking in which all of the subword tokens corresponding to a single word (tokenized by MeCab) are masked at once.

For training of each model, we used a v3-8 instance of Cloud TPUs provided by [TPU Research Cloud](https://sites.research.google/trc/about/).

## Licenses

The pretrained models are distributed under the Apache License 2.0.

## Acknowledgments

This model is trained with Cloud TPUs provided by [TPU Research Cloud](https://sites.research.google/trc/about/) program.
19 changes: 19 additions & 0 deletions bert/bert-large-japanese-v2/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"architectures": [
"BertForPreTraining"
],
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 0,
"type_vocab_size": 2,
"vocab_size": 32768
}
10 changes: 10 additions & 0 deletions bert/bert-large-japanese-v2/tokenizer_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"tokenizer_class": "BertJapaneseTokenizer",
"model_max_length": 512,
"do_lower_case": false,
"word_tokenizer_type": "mecab",
"subword_tokenizer_type": "wordpiece",
"mecab_kwargs": {
"mecab_dic": "unidic_lite"
}
}
Loading

0 comments on commit 82ae8f6

Please sign in to comment.