Fix languages covered by M4Tv2 #28019

ylacombe · 2023-12-13T21:00:59Z

What does this PR do?

Currently, M4Tv2 into-text tasks (ASR, S2TT, T2TT) do not work for languages outside of the 36 for which audio is supported. This is linked to a test at the beginning of the model generate. The model previously verified if the tgt_lang was in a bunch of dictionaries, independently from the output modality. This PR aims to fix that.

I've added a test to make sure it works.

cc @amyeroberts

amyeroberts

Thanks for fixing! Just some notes on the testing

src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py

amyeroberts · 2023-12-14T14:02:31Z

tests/models/seamless_m4t_v2/test_modeling_seamless_m4t_v2.py

@@ -784,7 +790,7 @@ def update_generation(self, model):

        model.generation_config = generation_config

-    def prepare_text_input(self):
+    def prepare_text_input(self, is_rus=False):


Why not just make this tgt_lang and then you can easily test with different target languages both in an outside of the supported languages with generate_speech as True and False?

amyeroberts · 2023-12-14T14:03:29Z

tests/models/seamless_m4t_v2/test_modeling_seamless_m4t_v2.py

+        # make sure that generating speech, with a language that is only supported for text translation, raises error
+        with self.assertRaises(ValueError):
+            model.generate(**input_text_rus)
+
+        # make sure that generating text only works
+        model.generate(**input_text_rus, generate_speech=False)


We should also make sure it works in both cases for a language supported for all tasks

…v2.py Co-authored-by: amyeroberts <[email protected]>

ylacombe · 2023-12-14T14:21:21Z

Thanks for the suggestion @amyeroberts ! I've integrated them and will merge when the CI is green!

* correct language assessment + add tests * Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py Co-authored-by: amyeroberts <[email protected]> * make style + simplify and enrich test --------- Co-authored-by: amyeroberts <[email protected]>

correct language assessment + add tests

df3d89f

amyeroberts approved these changes Dec 14, 2023

View reviewed changes

ylacombe and others added 2 commits December 14, 2023 14:14

Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_…

5005a2e

…v2.py Co-authored-by: amyeroberts <[email protected]>

make style + simplify and enrich test

3d2ea36

ylacombe merged commit bb1d0d0 into huggingface:main Dec 14, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix languages covered by M4Tv2 #28019

Fix languages covered by M4Tv2 #28019

ylacombe commented Dec 13, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Dec 14, 2023

amyeroberts Dec 14, 2023

ylacombe commented Dec 14, 2023

Fix languages covered by M4Tv2 #28019

Fix languages covered by M4Tv2 #28019

Conversation

ylacombe commented Dec 13, 2023 • edited Loading

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Dec 14, 2023

Choose a reason for hiding this comment

amyeroberts Dec 14, 2023

Choose a reason for hiding this comment

ylacombe commented Dec 14, 2023

ylacombe commented Dec 13, 2023 •

edited

Loading