Peft deepspeed resume #1227

winglian · 2024-01-28T23:33:55Z

nathan-az · 2024-01-29T13:02:32Z

Looks like it does the same as huggingface/transformers#28746
I suggest keeping track of this before committing to monkeypatching :)

winglian · 2024-01-29T13:20:21Z

Thanks for that. I'll wait for that to get merged 🤞. Hard to keep track.of everything upstream.

winglian · 2024-01-31T03:21:31Z

Once this is fixed upstream, we can remove the monkeypatch from this PR, but I think we still need to handle the lora_model_dir part.

winglian · 2024-01-31T13:55:56Z

@manishiitg this was fixed upstream, can you confirm if the upstream fix works for you?

winglian · 2024-01-31T16:52:34Z

looks like we need to handle some changes from this PR too huggingface/transformers#26610

manishiitg · 2024-02-01T07:05:06Z

@winglian i am unable to run now getting this issue #1240

not sure if its related but on multi gpu it doesn't work, works on single gpu

* import deepspeed integration * monkeypatch peft adapater with deepspeed for resume from checkpoint * fix patch * fix patches attempt 2 * make sure to set lora_model_dir * skip pylint for deepspeed.utils * pick up upstream fix in transformers * remove monkeypatch for deepspeed/peft fix * no need to set the lora_model_dir on resume * unset load_in_*bit when using quant config * guard before del * better handling of load_in* kwargs

winglian mentioned this pull request Jan 28, 2024

deepseed multiGPU resume from checkpoint fails #1134

Closed

8 tasks

winglian added the hold don't merge this yet label Jan 31, 2024

winglian added 7 commits January 31, 2024 08:57

import deepspeed integration

327136c

monkeypatch peft adapater with deepspeed for resume from checkpoint

7257274

fix patch

cb1ac04

fix patches attempt 2

ab67fd6

make sure to set lora_model_dir

0a123b5

skip pylint for deepspeed.utils

8d5d5dd

pick up upstream fix in transformers

839637c

winglian force-pushed the peft-deepspeed-resume branch from 5594554 to 839637c Compare January 31, 2024 13:58

winglian added 2 commits January 31, 2024 09:00

remove monkeypatch for deepspeed/peft fix

e58f030

no need to set the lora_model_dir on resume

8882d32

winglian added 3 commits January 31, 2024 12:08

unset load_in_*bit when using quant config

5b866a2

guard before del

3fa8746

better handling of load_in* kwargs

1cf6245

winglian added ready to merge and removed hold don't merge this yet labels Jan 31, 2024

winglian merged commit c67fb71 into main Jan 31, 2024
7 checks passed

winglian deleted the peft-deepspeed-resume branch January 31, 2024 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peft deepspeed resume #1227

Peft deepspeed resume #1227

winglian commented Jan 28, 2024

nathan-az commented Jan 29, 2024

winglian commented Jan 29, 2024

winglian commented Jan 31, 2024

winglian commented Jan 31, 2024

winglian commented Jan 31, 2024

manishiitg commented Feb 1, 2024

Peft deepspeed resume #1227

Peft deepspeed resume #1227

Conversation

winglian commented Jan 28, 2024

nathan-az commented Jan 29, 2024

winglian commented Jan 29, 2024

winglian commented Jan 31, 2024

winglian commented Jan 31, 2024

winglian commented Jan 31, 2024

manishiitg commented Feb 1, 2024