Need to explicitly set use_reentrant when calling checkpoint #26969

FartyPants · 2023-10-20T22:32:56Z

System Info

windows

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

according to new pytorch, you need to now explicitly set use_reentrant as it will be changed from use_reentrant=True to use_reentrant=False in near future

transformers.models.llama.modeling_llama
def forward...

            layer_outputs = torch.utils.checkpoint.checkpoint(
                create_custom_forward(decoder_layer), hidden_states, attention_mask, position_ids
            )

Expected behavior

need to explicitly set use_reentrant

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-10-23T07:18:00Z

cc @fxmarty would you like to have a look at this? 😉

ArthurZucker · 2023-10-23T08:54:06Z

Seems like @younesbelkada also needs this in #26917

IbrahimAmin1 · 2023-11-13T10:01:06Z

You can set it explicitly in the training_args arguments by using the gradient_checkpointing_kwargs argument

training_args = TrainingArguments(
        # Arguments
        gradient_checkpointing=True,
        gradient_checkpointing_kwargs={'use_reentrant':False} # OR gradient_checkpointing_kwargs={'use_reentrant':True} 
        # Arguments
)

GrahamEckel · 2023-12-04T17:27:49Z

FYI, this solution does not work when using SFTTrainer() from trl as the parameter is not exposed.

younesbelkada · 2023-12-05T16:24:22Z

@GrahamEckel can you elaborate on the issue you face with TRL SFTTrainer? Ideally with a small reproducer 🙏

Vectorrent · 2023-12-05T16:33:41Z

Are we able to fix this when NOT using the trainer? I tried passing gradient_checkpointing_kwargs={'use_reentrant':False} to model.gradient_checkpointing_enabled(), but it just bombs-out with a "use_reentrant is an unrecognized argument" error.

I'm currently on Transformers 4.35.2.

younesbelkada · 2023-12-05T17:07:39Z

@LuciferianInk which model are you using?

model.gradient_checkpointing_enable(gradient_checkpointing_kwargs={"use_reentrant": False})

Should work for all standard transformers model. We also have CI tests for that: https://github.com/huggingface/transformers/blob/main/tests/test_modeling_common.py#L575 and

transformers/tests/test_modeling_common.py

Line 626 in ac97507

    
           self.check_training_gradient_checkpointing(gradient_checkpointing_kwargs={"use_reentrant": False})

Vectorrent · 2023-12-05T17:23:48Z

Oops, syntax error. Sorry for the false alarm. With your example, I was able to fix that!

younesbelkada · 2023-12-05T17:30:32Z

Awesome, thanks !

manmax31 · 2023-12-15T05:48:08Z

I am trying to finetune mistral 7b using SFT and PEFT, but i get the following error when I have gradient_checkpointing=True
ValueError: Attention mask should be of size (1, 1, 2700, 5400), but is torch.Size([1, 1, 2700, 2700])

I have tried gradient_checkpointing=True and gradient_checkpointing_kwargs={"use_reentrant": True} and I still get the above error.

These are the versions I have:
Transformers version: 4.36.1
PEFT version: 0.7.1
TRL version: 0.7.4

Here is my code:

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
if torch.cuda.device_count() > 1:  # If more than 1 GPU
    model.is_parallelizable = True
    model.model_parallel = True

training_args = TrainingArguments(
    output_dir="models",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=1.41e-5,
    logging_steps=1,
    num_train_epochs=1,
    # max_steps=100,
    report_to=None,
    save_steps=30,
    save_total_limit=2,
    evaluation_strategy="steps",
    eval_steps=10,
    do_eval=True,
    greater_is_better=False,
    load_best_model_at_end=True,
    auto_find_batch_size=True,
    optim="paged_adamw_8bit",
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
    gradient_checkpointing=True,  # Leads to reduction in memory at slighly decrease in speed
    gradient_checkpointing_kwargs={"use_reentrant": True},
)

# LoraConfig
peft_config = LoraConfig(
    r=32,
    lora_alpha=32, 
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj"],
)

early_stop = EarlyStoppingCallback(10)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=peft_config,
    max_seq_length=2700, 
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",
    packing=True,
    neftune_noise_alpha=5,
    callbacks=[early_stop],
)

trainer.train()

younesbelkada · 2023-12-15T09:57:34Z

Hi @manmax31
The issue is fixed by #28031 please see my comment here: #28056 (comment)
Can you try out with transformers main? pip install -U git+https://github.com/huggingface/transformers

manmax31 · 2023-12-15T11:00:03Z

Thank you. Is this fix not in pypi yet?
As that's only way our systems can access it.

younesbelkada · 2023-12-15T12:06:15Z

cc @ArthurZucker @amyeroberts would it makes sense to do a patch release to include #28031 ? it fixes a regression issue - i.e. users were able to train as usual with PEFT and GC before introducing the attention refactor and #28031 fixes it

manmax31 · 2023-12-15T12:18:47Z

That will be great. I am currently now back to 4.35.2

amyeroberts · 2023-12-15T13:03:41Z

@younesbelkada If it's a regression, then yes, I think we should do a patch release (also including #28043 and #28061) cc @ArthurZucker WDYT?

ArthurZucker · 2023-12-16T16:09:20Z

Yes 👍🏻

github-actions · 2024-01-10T08:04:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2024-01-10T13:29:12Z

Was fixed and released so closing

* fix bug: generate_args-do_sample * fix gradient_checkpointing_kwargs bug see: huggingface/trl#912 and huggingface/transformers#26969 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

StephennFernandes · 2024-02-24T00:55:09Z

@ArthurZucker is this issue fixed, still facing same issue even after fresh release installed from source

ArthurZucker · 2024-02-27T06:24:45Z

Could you open a new issue, with a fresh reproducer, the output of transformers-cli env and the full traceback? 🤗

amyeroberts mentioned this issue Nov 28, 2023

Add PvT-v2 Model #26812

Merged

jojortz mentioned this issue Dec 22, 2023

update openai, pydantic, and chromadb versions and examples to work with uniflow CambioML/pykoi-rlhf-finetuned-transformers#96

Merged

ArthurZucker closed this as completed Jan 10, 2024

johnowhitaker mentioned this issue Feb 1, 2024

SFTTrainer not using both GPUs huggingface/trl#1303

Closed

wwxFromTju mentioned this issue Feb 5, 2024

fix gradient_checkpointing_kwargs bug OpenRLHF/OpenRLHF#206

Merged

amyeroberts mentioned this issue Aug 12, 2024

Gradient checkpointing warning #32576

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to explicitly set use_reentrant when calling checkpoint #26969

Need to explicitly set use_reentrant when calling checkpoint #26969

FartyPants commented Oct 20, 2023

ArthurZucker commented Oct 23, 2023

ArthurZucker commented Oct 23, 2023

IbrahimAmin1 commented Nov 13, 2023 •

edited

Loading

GrahamEckel commented Dec 4, 2023

younesbelkada commented Dec 5, 2023

Vectorrent commented Dec 5, 2023

younesbelkada commented Dec 5, 2023

Vectorrent commented Dec 5, 2023

younesbelkada commented Dec 5, 2023

manmax31 commented Dec 15, 2023 •

edited

Loading

younesbelkada commented Dec 15, 2023

manmax31 commented Dec 15, 2023 •

edited

Loading

younesbelkada commented Dec 15, 2023

manmax31 commented Dec 15, 2023

amyeroberts commented Dec 15, 2023

ArthurZucker commented Dec 16, 2023

github-actions bot commented Jan 10, 2024

ArthurZucker commented Jan 10, 2024 •

edited

Loading

StephennFernandes commented Feb 24, 2024

ArthurZucker commented Feb 27, 2024

Need to explicitly set use_reentrant when calling checkpoint #26969

Need to explicitly set use_reentrant when calling checkpoint #26969

Comments

FartyPants commented Oct 20, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Oct 23, 2023

ArthurZucker commented Oct 23, 2023

IbrahimAmin1 commented Nov 13, 2023 • edited Loading

GrahamEckel commented Dec 4, 2023

younesbelkada commented Dec 5, 2023

Vectorrent commented Dec 5, 2023

younesbelkada commented Dec 5, 2023

Vectorrent commented Dec 5, 2023

younesbelkada commented Dec 5, 2023

manmax31 commented Dec 15, 2023 • edited Loading

younesbelkada commented Dec 15, 2023

manmax31 commented Dec 15, 2023 • edited Loading

younesbelkada commented Dec 15, 2023

manmax31 commented Dec 15, 2023

amyeroberts commented Dec 15, 2023

ArthurZucker commented Dec 16, 2023

github-actions bot commented Jan 10, 2024

ArthurZucker commented Jan 10, 2024 • edited Loading

StephennFernandes commented Feb 24, 2024

ArthurZucker commented Feb 27, 2024

IbrahimAmin1 commented Nov 13, 2023 •

edited

Loading

manmax31 commented Dec 15, 2023 •

edited

Loading

manmax31 commented Dec 15, 2023 •

edited

Loading

ArthurZucker commented Jan 10, 2024 •

edited

Loading