Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradient accumulation tests, embeddings w pad_token fix, smaller models #2059

Merged
merged 12 commits into from
Nov 14, 2024

Conversation

winglian
Copy link
Collaborator

@winglian winglian commented Nov 14, 2024

Description

adds more tests around gradient accumulation on multi-gpu
don't require lora on embeddings when pad_token is set, since this isn't trained
add warning about edge case for qlora+zero3+use_reentrant
smaller models to make the tests faster

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

@winglian winglian merged commit 71d4030 into main Nov 14, 2024
14 checks passed
bursteratom pushed a commit that referenced this pull request Nov 18, 2024
…ls (#2059)

* add more test cases for gradient accumulation and fix zero3

* swap out for smaller model

* fix missing return

* fix missing pad_token in config

* support concurrency for multigpu testing

* cast empty deepspeed to empty string for zero3 check

* fix temp_dir as fixture so parametrize works properly

* fix test file for multigpu evals

* don't use default

* don't use default for fsdp_state_dict_type

* don't use llama tokenizer w smollm

* also automatically cancel multigpu for concurrency
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
…ls (#2059)

* add more test cases for gradient accumulation and fix zero3

* swap out for smaller model

* fix missing return

* fix missing pad_token in config

* support concurrency for multigpu testing

* cast empty deepspeed to empty string for zero3 check

* fix temp_dir as fixture so parametrize works properly

* fix test file for multigpu evals

* don't use default

* don't use default for fsdp_state_dict_type

* don't use llama tokenizer w smollm

* also automatically cancel multigpu for concurrency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant