gradient accumulation tests, embeddings w pad_token fix, smaller models #2059

winglian · 2024-11-14T14:58:30Z

Description

adds more tests around gradient accumulation on multi-gpu
don't require lora on embeddings when pad_token is set, since this isn't trained
add warning about edge case for qlora+zero3+use_reentrant
smaller models to make the tests faster

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

…ls (#2059) * add more test cases for gradient accumulation and fix zero3 * swap out for smaller model * fix missing return * fix missing pad_token in config * support concurrency for multigpu testing * cast empty deepspeed to empty string for zero3 check * fix temp_dir as fixture so parametrize works properly * fix test file for multigpu evals * don't use default * don't use default for fsdp_state_dict_type * don't use llama tokenizer w smollm * also automatically cancel multigpu for concurrency

winglian added 12 commits November 14, 2024 09:52

add more test cases for gradient accumulation and fix zero3

174678c

swap out for smaller model

b36d752

fix missing return

f11fa96

fix missing pad_token in config

09e97a0

support concurrency for multigpu testing

7595420

cast empty deepspeed to empty string for zero3 check

55cff2e

fix temp_dir as fixture so parametrize works properly

fbe61d6

fix test file for multigpu evals

2e7b7b3

don't use default

d134f84

don't use default for fsdp_state_dict_type

8f4ec95

don't use llama tokenizer w smollm

8b8d56f

also automatically cancel multigpu for concurrency

1265e12

winglian added the ready to merge label Nov 14, 2024

winglian merged commit 71d4030 into main Nov 14, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradient accumulation tests, embeddings w pad_token fix, smaller models #2059

gradient accumulation tests, embeddings w pad_token fix, smaller models #2059

winglian commented Nov 14, 2024 •

edited

Loading

gradient accumulation tests, embeddings w pad_token fix, smaller models #2059

gradient accumulation tests, embeddings w pad_token fix, smaller models #2059

Conversation

winglian commented Nov 14, 2024 • edited Loading

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

winglian commented Nov 14, 2024 •

edited

Loading