Allow for str versions of dicts based on typing #30227

muellerzr · 2024-04-12T18:29:36Z

What does this PR do?

This PR adds support for passing in a string dictionary to arguments that allow a dict in the argparser. For example:

python test.py --output_dir testdir --accelerator_config='{"dispatch_batches":"False"}'

(test.py is just a small script reading the args from the parsers):

from transformers import HfArgumentParser, TrainingArguments

parser = HfArgumentParser((TrainingArguments))

training_args = parser.parse_args_into_dataclasses()

print(training_args)

This also fixes issues with typing of not being able to state that deepspeed and fsdp_config can't have type dict. Turns out it's a (very) fun setting with the argparser wrapper we have for these. Types should be declared as (for this):

Optional[Union[dict,str,...]]

This is very important to get working properly!

Fixes #30204

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts @pacman100

Technically everything will go brr if passes, since any call to the CLI/parser happens usually with examples etc. So new tests added, but did manually verify input/output from a basic script locally

HuggingFaceDocBuilderDev · 2024-04-12T18:53:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/training_args.py

muellerzr · 2024-04-15T13:04:08Z

@amyeroberts figured out where to add a test, and verified that all 3 conditions raise their respective errors :)

amyeroberts

Thanks for enabling this!

src/transformers/training_args.py

amyeroberts · 2024-04-15T18:10:54Z

src/transformers/training_args.py

@@ -1380,6 +1390,14 @@ class TrainingArguments:
    )

    def __post_init__(self):
+        # Parse in args that could be `dict` sent in from the CLI as a string
+        for field in VALID_DICT_FIELDS:


I like just checking for a few possible args!

amyeroberts · 2024-04-15T18:34:25Z

tests/utils/test_hf_argparser.py

@@ -405,3 +406,58 @@ def test_parse_yaml(self):
    def test_integration_training_args(self):
        parser = HfArgumentParser(TrainingArguments)
        self.assertIsNotNone(parser)
+
+    def test_valid_dict_annotation(self):


I know on offline discussion I said we probably don't need a test to check the parsing. Seeing the implementation, i.e. TrainingArguments can be created with a field as a string representation of a dict, and not having to include the CLI, I think we can add a simple for at least one of the fields in VALID_DICT_FIELDS

e.g. something along the lines of:

def test_valid_dict_input_parsing(self): args = TrainingArguments( field_name='{"key": value}' ) # Or however it's assigned in the args self.assertEqual(args.field_name, {key: value})

Done. This also made me notice that we cast int and bools as str still, so added a helper for this (and does so to avoid literal_eval, which can be exploited)

pacman100

Thank you @muellerzr for adding this super useful feature to be able to pass string representation of dict as cmd arguments! 🚀

This should be very useful for the CLI support that @younesbelkada worked wrt TRL.

amyeroberts

Great! Thanks for enabling this and adding these tests ❤️

amyeroberts · 2024-04-16T11:04:58Z

src/transformers/training_args.py

+        elif isinstance(value, str):
+            # First check for bool and convert
+            if value.lower() in ("true", "false"):
+                passed_value[key] = value.lower() == "true"


amyeroberts · 2024-04-16T11:06:29Z

tests/utils/test_hf_argparser.py

+                accelerator_config='{"split_batches": "True", "gradient_accumulation_kwargs": {"num_steps": 2}}',
+            )
+            self.assertEqual(args.accelerator_config.split_batches, True)
+            self.assertEqual(args.accelerator_config.gradient_accumulation_kwargs["num_steps"], 2)


* Bookmark, initial impelemtation. Need to test * Clean * Working fully, woop woop * I think working version now, testing * Fin! * rm cast, could keep None * Fix typing issue * rm typehint * Add test * Add tests and make more rigid

muellerzr added 5 commits April 12, 2024 10:37

Bookmark, initial impelemtation. Need to test

db72d1a

Clean

4798149

Working fully, woop woop

a8e132c

I think working version now, testing

5d9a39a

Fin!

16c3bf7

muellerzr requested review from pacman100 and amyeroberts April 12, 2024 18:29

rm cast, could keep None

a483b4b

muellerzr added 2 commits April 12, 2024 14:59

Fix typing issue

98ce2dc

rm typehint

9c69c16

muellerzr commented Apr 15, 2024

View reviewed changes

src/transformers/training_args.py Show resolved Hide resolved

Add test

1c73ab1

amyeroberts approved these changes Apr 15, 2024

View reviewed changes

Add tests and make more rigid

f81dcf0

pacman100 approved these changes Apr 16, 2024

View reviewed changes

amyeroberts approved these changes Apr 16, 2024

View reviewed changes

muellerzr merged commit 487505f into main Apr 16, 2024
21 checks passed

muellerzr deleted the muellerzr-allow-dict-parsing branch April 16, 2024 12:15

lewtun mentioned this pull request Apr 22, 2024

PPO / Reinforce Trainers huggingface/trl#1540

Merged

Uminosachi mentioned this pull request May 31, 2024

Set scheduler_specific_kwargs to get_scheduler hiyouga/LLaMA-Factory#4006

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for str versions of dicts based on typing #30227

Allow for str versions of dicts based on typing #30227

muellerzr commented Apr 12, 2024

HuggingFaceDocBuilderDev commented Apr 12, 2024

muellerzr commented Apr 15, 2024

amyeroberts left a comment

amyeroberts Apr 15, 2024

amyeroberts Apr 15, 2024

muellerzr Apr 15, 2024

pacman100 left a comment

amyeroberts left a comment

amyeroberts Apr 16, 2024

amyeroberts Apr 16, 2024

Allow for str versions of dicts based on typing #30227

Allow for str versions of dicts based on typing #30227

Conversation

muellerzr commented Apr 12, 2024

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 12, 2024

muellerzr commented Apr 15, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Apr 15, 2024

Choose a reason for hiding this comment

amyeroberts Apr 15, 2024

Choose a reason for hiding this comment

muellerzr Apr 15, 2024

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Apr 16, 2024

Choose a reason for hiding this comment

amyeroberts Apr 16, 2024

Choose a reason for hiding this comment