Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transforms to logged config #1428

Merged
merged 3 commits into from
Aug 5, 2024
Merged

Add transforms to logged config #1428

merged 3 commits into from
Aug 5, 2024

Conversation

b-chu
Copy link
Contributor

@b-chu b-chu commented Aug 5, 2024

Uses the transformed version of the training config as the logged config. This is because we infer values like device_train_batch_size during transforms. The logged config is later used in callbacks like curriculum learning which should have access to all the data in TrainConfig.

Manual run to test that logged_cfg matches the train_cfg values from transforms and curriculum learning callback works.

@b-chu b-chu requested a review from a team as a code owner August 5, 2024 15:08
@b-chu b-chu requested a review from dakinggg August 5, 2024 15:08
Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh nice. can you include the run name for the manual test (and highlight the differences in the logged configs, if possible)?

thanks!!

@b-chu
Copy link
Contributor Author

b-chu commented Aug 5, 2024

logged_cfg before
dict_keys(['variables', 'max_seq_len', 'run_name', 'model', 'tokenizer', 'train_loader', 'scheduler', 'optimizer', 'algorithms', 'max_duration', 'eval_interval', 'eval_first', 'eval_subset_num_batches', 'global_train_batch_size', 'seed', 'device_eval_batch_size', 'device_train_microbatch_size', 'precision', 'fsdp_config', 'progress_bar', 'log_to_console', 'console_log_interval', 'callbacks', 'save_interval', 'merge'])

logged_cfg after
dict_keys(['variables', 'max_seq_len', 'run_name', 'model', 'tokenizer', 'train_loader', 'scheduler', 'optimizer', 'algorithms', 'max_duration', 'eval_interval', 'eval_first', 'eval_subset_num_batches', 'global_train_batch_size', 'seed', 'device_eval_batch_size', 'device_train_microbatch_size', 'precision', 'fsdp_config', 'progress_bar', 'log_to_console', 'console_log_interval', 'callbacks', 'save_interval', 'n_gpus', 'device_train_batch_size', 'device_train_grad_accum', 'merge'])

@b-chu b-chu requested a review from snarayan21 August 5, 2024 15:26
Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@b-chu
Copy link
Contributor Author

b-chu commented Aug 5, 2024

Failing from transient HF timeouts :( @snarayan21 PR also needs to update branch

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, forgot the callbacks use the logged config. Thanks!

@b-chu b-chu enabled auto-merge (squash) August 5, 2024 16:51
@b-chu b-chu merged commit 6dcc18a into main Aug 5, 2024
9 checks passed
@dakinggg dakinggg deleted the config branch August 6, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants