Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix languages covered by M4Tv2 #28019

Merged
merged 3 commits into from
Dec 14, 2023
Merged

Conversation

ylacombe
Copy link
Contributor

@ylacombe ylacombe commented Dec 13, 2023

What does this PR do?

Currently, M4Tv2 into-text tasks (ASR, S2TT, T2TT) do not work for languages outside of the 36 for which audio is supported. This is linked to a test at the beginning of the model generate. The model previously verified if the tgt_lang was in a bunch of dictionaries, independently from the output modality. This PR aims to fix that.

I've added a test to make sure it works.

cc @amyeroberts

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! Just some notes on the testing

@@ -784,7 +790,7 @@ def update_generation(self, model):

model.generation_config = generation_config

def prepare_text_input(self):
def prepare_text_input(self, is_rus=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make this tgt_lang and then you can easily test with different target languages both in an outside of the supported languages with generate_speech as True and False?

Comment on lines +857 to +862
# make sure that generating speech, with a language that is only supported for text translation, raises error
with self.assertRaises(ValueError):
model.generate(**input_text_rus)

# make sure that generating text only works
model.generate(**input_text_rus, generate_speech=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make sure it works in both cases for a language supported for all tasks

@ylacombe
Copy link
Contributor Author

Thanks for the suggestion @amyeroberts ! I've integrated them and will merge when the CI is green!

@ylacombe ylacombe merged commit bb1d0d0 into huggingface:main Dec 14, 2023
18 checks passed
iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023
* correct language assessment  + add tests

* Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py

Co-authored-by: amyeroberts <[email protected]>

* make style + simplify and enrich test

---------

Co-authored-by: amyeroberts <[email protected]>
staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024
* correct language assessment  + add tests

* Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py

Co-authored-by: amyeroberts <[email protected]>

* make style + simplify and enrich test

---------

Co-authored-by: amyeroberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants