Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on other language #3

Open
yiwei0730 opened this issue Jun 5, 2024 · 6 comments
Open

Train on other language #3

yiwei0730 opened this issue Jun 5, 2024 · 6 comments

Comments

@yiwei0730
Copy link

Hello, this is amazing.
I want to ask is it can be trained in other languages, or even if can be trained in multiple languages ​​at the same time.

@zhangshaolei1998
Copy link
Collaborator

Hi, thanks for your attention.
StreamSpeech architecture can support multilingual speech-to-speech translation, which we have also explored above. Since multilingual is not the core highlight of this work, we did not cover it in our paper.

If you want to train a multilingual StreamSpeech on CVSS-C, you only need to modify the data processing part. The training part is the same.

Hope these can help you.

@tiannanzhang
Copy link

Hi, what changes should be made for speech translation to a language other than English, what parts need to be modified apart from data processing?

Thanks!

@zhangshaolei1998
Copy link
Collaborator

@arararz
Hi,
If you want to train StreamSpeech that translate speech to other languages ​​(other than English), in addition to data preparation, there are two points to note:

  1. To extract the units of the target speech, you need to use the Vocoder of the corresponding language, which can be found here.
  2. Appropriately adjust --ctc-upsample-rate. You can refer to Appendix D of our paper and adjust it to 2-3 times the unit/word sequence length ratio.

Hope these can help you~

@thetushargoyal
Copy link

@zhangshaolei1998 hey very interesting work. I was wondering about the training time and what system configuration did you use? thanks

@zhangshaolei1998
Copy link
Collaborator

@thetushargoyal
Hi, the training takes less than 1 day on 8 NVIDIA 3090 GPUs.

@nasirudeenraheem
Copy link

Hi, thanks for your attention. StreamSpeech architecture can support multilingual speech-to-speech translation, which we have also explored above. Since multilingual is not the core highlight of this work, we did not cover it in our paper.

If you want to train a multilingual StreamSpeech on CVSS-C, you only need to modify the data processing part. The training part is the same.

Hope these can help you.

Thanks for this information. Ihave the following questions?

  1. How can we train with our data?
  2. Is it possible to use some Hugging face pre-trained model on something like En-ar, if Yes? How can I do that.
    Thank you for answering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants