-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix languages covered by M4Tv2 #28019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! Just some notes on the testing
src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py
Outdated
Show resolved
Hide resolved
@@ -784,7 +790,7 @@ def update_generation(self, model): | |||
|
|||
model.generation_config = generation_config | |||
|
|||
def prepare_text_input(self): | |||
def prepare_text_input(self, is_rus=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make this tgt_lang
and then you can easily test with different target languages both in an outside of the supported languages with generate_speech
as True
and False
?
# make sure that generating speech, with a language that is only supported for text translation, raises error | ||
with self.assertRaises(ValueError): | ||
model.generate(**input_text_rus) | ||
|
||
# make sure that generating text only works | ||
model.generate(**input_text_rus, generate_speech=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make sure it works in both cases for a language supported for all tasks
…v2.py Co-authored-by: amyeroberts <[email protected]>
Thanks for the suggestion @amyeroberts ! I've integrated them and will merge when the CI is green! |
* correct language assessment + add tests * Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py Co-authored-by: amyeroberts <[email protected]> * make style + simplify and enrich test --------- Co-authored-by: amyeroberts <[email protected]>
* correct language assessment + add tests * Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py Co-authored-by: amyeroberts <[email protected]> * make style + simplify and enrich test --------- Co-authored-by: amyeroberts <[email protected]>
What does this PR do?
Currently, M4Tv2 into-text tasks (ASR, S2TT, T2TT) do not work for languages outside of the 36 for which audio is supported. This is linked to a test at the beginning of the model generate. The model previously verified if the
tgt_lang
was in a bunch of dictionaries, independently from the output modality. This PR aims to fix that.I've added a test to make sure it works.
cc @amyeroberts