🕊️ DPO padding free #2520

qgallouedec · 2024-12-26T19:03:29Z

What does this PR do?

demo; further experiments in next comment

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import DPOConfig, DPOTrainer
import torch

model_id = "Qwen/Qwen2-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained(model_id)
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train[:10%]")
training_args = DPOConfig(output_dir="Gemma2-2B-DPO-pf", max_prompt_length=128, max_completion_length=128, logging_steps=10, padding_free=True)
trainer = DPOTrainer(model=model, args=training_args, train_dataset=dataset, processing_class=tokenizer)
trainer.train()

With and without padding-free (not sure why they don't match exactly, the logits do precisely match though)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-12-26T19:07:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

.github/workflows/tests.yml

trl/trainer/dpo_trainer.py

…o padding_free

trl/trainer/dpo_config.py

…padding-free

trl/trainer/dpo_trainer.py

qgallouedec · 2025-01-07T19:20:37Z

Regression test:

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import DPOConfig, DPOTrainer
import torch

model_id = "Qwen/Qwen2-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained(model_id)
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train[:10%]")
# dataset = load_dataset("trl-internal-testing/zen", "standard_preference", split="train")
training_args = DPOConfig(output_dir="Qwen2-0.5B-DPO-no_pf", max_prompt_length=128, max_completion_length=128, logging_steps=10, padding_free=False)
trainer = DPOTrainer(model=model, args=training_args, train_dataset=dataset, processing_class=tokenizer)
trainer.train()

Is the new padding_free=False (no_pf in screenshot) equivalent to DPO on current main branch (main in screenshot)? -> yes

Does padding_free=True (pf in screenshot) results match padding_free=False (no_pf in screenshot) results? -> Yes

(note: on screenshots its written "Gemma" but it's actually a Qwen model trained)

qgallouedec · 2025-01-07T20:18:51Z

trl/trainer/dpo_trainer.py

-            logits = outputs.logits[:, :-1, :]
-            labels = input_ids[:, 1:].clone()
-            loss_mask = loss_mask[:, 1:].bool()


rolling works in both cases (flattened tensors and batched)

# Padding case # input_ids = [[1, 2, 3, 4], # [5, 6, 7, 8]] labels = input_ids[:, 1:].clone() # labels = [[2, 3, 4], # [6, 7, 8]] # But # Padding-free case # input_ids = [[1, 2, 3, 4, 5, 6, 7, 8]] labels = input_ids[:, 1:].clone() # labels = [[2, 3, 4, 5, 6, 7, 8]]

The first token of the first sequence (1) is removed but not the first token of the second sequence (5). To align the labels while keeping a consistent behaviour across sequences in the batch, we use roll instead. The only difference is that the first token, instead of being discarded, is appended to the end.

# Padding case # input_ids = [[1, 2, 3, 4], # [5, 6, 7, 8]] labels = torch.roll(input_ids, shifts=-1, dims=1) # labels = [[2, 3, 4, 1], # [6, 7, 8, 5]] # And # Padding-free case # input_ids = [[1, 2, 3, 4, 5, 6, 7, 8]] labels = torch.roll(input_ids, shifts=-1, dims=1) # labels = [[2, 3, 4, 5, 6, 7, 8, 1]]

qgallouedec · 2025-01-07T20:20:15Z

trl/trainer/dpo_trainer.py

+                num_logits_to_keep = (loss_mask.shape[1] - first_compute_index).item() + 1  # +1 for the first label
+                model_kwargs["num_logits_to_keep"] = num_logits_to_keep


Read https://github.com/huggingface/trl/pull/2520/files#r1905992872 before

since we have an additional token in the end, we need to keep one additional token.
This update make things at this point nicer imo.

August-murr

LGTM!

qgallouedec added 6 commits December 26, 2024 16:23

padding free

4980f09

specify dtype

c21d4ba

test

d3e2e19

warnings when not flash attention

1921a03

fix test

b451208

remove

4384121

qgallouedec added 8 commits December 26, 2024 20:50

docstring padding-free

854c282

flash-attn dep

223a336

Stronger warning

47997d4

require_flash_attn in test

a69a63c

flash-attn in CI

c6e9be0

rm flash-attn from dep

963d0ca

Remove flash-attn dependency from test workflows

b60247e

refactor

f753449

qgallouedec commented Dec 26, 2024

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

Update .github/workflows/tests.yml

dadd028

qgallouedec commented Dec 26, 2024

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 7 commits December 27, 2024 00:26

Update trl/trainer/dpo_trainer.py

c68a316

drop require flash-attn

72df715

Merge branch 'padding_free' of https://github.com/huggingface/trl int…

31ba855

…o padding_free

fix dtype

098a773

Merge branch 'main' into padding_free

9328aa6

refine warning

31d6e0b

Merge branch 'main' into padding_free

fe28812

qgallouedec commented Jan 7, 2025

View reviewed changes

trl/trainer/dpo_config.py Outdated Show resolved Hide resolved

qgallouedec and others added 3 commits January 7, 2025 18:56

Update trl/trainer/dpo_config.py

13a3250

Add logic to compute mean logits for chosen and rejected tokens with …

02d5399

…padding-free

format

175c5e2

qgallouedec commented Jan 7, 2025

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 3 commits January 7, 2025 20:03

Update trl/trainer/dpo_trainer.py

9f086e5

Update trl/trainer/dpo_trainer.py

f96b5d1

fix comment [ci skip]

d806e31

qgallouedec requested review from lewtun and kashif January 7, 2025 19:21

fix num logits to keep

4ade2eb

qgallouedec commented Jan 7, 2025

View reviewed changes

qgallouedec requested a review from August-murr January 7, 2025 20:23

August-murr approved these changes Jan 8, 2025

View reviewed changes

qgallouedec merged commit 4516772 into main Jan 8, 2025
14 checks passed

qgallouedec deleted the padding_free branch January 8, 2025 08:22

This was referenced Jan 8, 2025

Padding free dpo #2437

Closed

Let DPOTrainer Support padding_free #2422

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🕊️ DPO padding free #2520

🕊️ DPO padding free #2520

qgallouedec commented Dec 26, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 26, 2024

qgallouedec commented Jan 7, 2025

qgallouedec Jan 7, 2025 •

edited

Loading

qgallouedec Jan 7, 2025

August-murr left a comment

		num_logits_to_keep = (loss_mask.shape[1] - first_compute_index).item() + 1 # +1 for the first label
		model_kwargs["num_logits_to_keep"] = num_logits_to_keep

🕊️ DPO padding free #2520

🕊️ DPO padding free #2520

Conversation

qgallouedec commented Dec 26, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Dec 26, 2024

qgallouedec commented Jan 7, 2025

Regression test:

qgallouedec Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

qgallouedec Jan 7, 2025

Choose a reason for hiding this comment

August-murr left a comment

Choose a reason for hiding this comment

qgallouedec commented Dec 26, 2024 •

edited

Loading

qgallouedec Jan 7, 2025 •

edited

Loading