Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ•ΉοΈ CLI refactor #2380

Merged
merged 53 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c77503a
Refactor main function in dpo.py
qgallouedec Nov 21, 2024
66254d8
Update setup.py and add cli.py
qgallouedec Nov 21, 2024
49f34d1
Add examples to package data
qgallouedec Nov 21, 2024
35ee4e6
style
qgallouedec Nov 21, 2024
09e1257
Refactor setup.py file
qgallouedec Nov 21, 2024
4ccf137
Add new file t.py
qgallouedec Nov 21, 2024
d397ab8
Move dpo to package
qgallouedec Nov 21, 2024
dad9ecc
Update MANIFEST.in and setup.py, refactor trl/cli.py
qgallouedec Nov 21, 2024
3024d5b
Add __init__.py to trl/scripts directory
qgallouedec Nov 21, 2024
60583c2
Add license header to __init__.py
qgallouedec Nov 21, 2024
e5ec9e7
Merge branch 'main' into cli-refactor
qgallouedec Nov 22, 2024
5eb1adf
File moved instruction
qgallouedec Nov 22, 2024
5f37d46
Merge branch 'cli-refactor' of https://github.com/huggingface/trl int…
qgallouedec Nov 22, 2024
793ce44
Add Apache License and update file path
qgallouedec Nov 22, 2024
3713056
Merge branch 'main' into cli-refactor
qgallouedec Nov 24, 2024
1c6261b
Merge branch 'main' into cli-refactor
qgallouedec Nov 25, 2024
bf27b36
Move dpo.py to new location
qgallouedec Nov 25, 2024
30e0e50
Merge branch 'cli-refactor' of https://github.com/huggingface/trl int…
qgallouedec Nov 25, 2024
adac644
Refactor CLI and DPO script
qgallouedec Nov 25, 2024
5c5a254
Merge branch 'main' into cli-refactor
qgallouedec Nov 26, 2024
923ba0c
Merge branch 'main' into cli-refactor
qgallouedec Nov 28, 2024
a15a41c
Refactor import structure in scripts package
qgallouedec Nov 28, 2024
f0a20b2
Merge branch 'main' into cli-refactor
qgallouedec Dec 4, 2024
7a0a4f0
env
qgallouedec Dec 4, 2024
167f23f
rm config from chat arg
qgallouedec Dec 4, 2024
084e33a
rm old cli
qgallouedec Dec 4, 2024
70dd253
chat init
qgallouedec Dec 4, 2024
972f7c6
test cli [skip ci]
qgallouedec Dec 5, 2024
1386d41
Add `datast_config_name` to `ScriptArguments` (#2440)
qgallouedec Dec 5, 2024
bf289d8
add missing arg
qgallouedec Dec 5, 2024
d811b1b
Add test cases for 'trl sft' and 'trl dpo' commands
qgallouedec Dec 5, 2024
61706af
Add sft.py script and update cli.py to include sft command
qgallouedec Dec 5, 2024
d9094e2
Move sft script
qgallouedec Dec 5, 2024
7d2e62c
chat
qgallouedec Dec 5, 2024
d468545
style [ci skip]
qgallouedec Dec 5, 2024
93d423c
kto
qgallouedec Dec 5, 2024
9ee485a
rm example config
qgallouedec Dec 5, 2024
5f86e61
first step on doc
qgallouedec Dec 5, 2024
779062b
see #2442
qgallouedec Dec 5, 2024
0892264
see #2443
qgallouedec Dec 5, 2024
746baec
fix chat windows
qgallouedec Dec 5, 2024
2fc0b6f
©️ Copyrights update (#2454)
qgallouedec Dec 10, 2024
6941e0f
πŸ’¬ Fix chat for windows (#2443)
qgallouedec Dec 10, 2024
b202b15
πŸ†” Add `datast_config` to `ScriptArguments` (#2440)
qgallouedec Dec 10, 2024
2401463
🏎 Fix deepspeed preparation of `ref_model` in `OnlineDPOTrainer` (#2417)
qgallouedec Dec 10, 2024
be0ca9b
Merge branch 'main' into cli-refactor
qgallouedec Dec 10, 2024
c0209f9
Fix config name
qgallouedec Dec 10, 2024
cbff826
Merge branch 'main' into cli-refactor
qgallouedec Dec 13, 2024
0263435
Remove `make dev` in favor of `pip install -e .[dev]`
qgallouedec Dec 13, 2024
590afa0
Merge branch 'cli-refactor' of https://github.com/huggingface/trl int…
qgallouedec Dec 13, 2024
98458a0
Update script paths and remove old symlink related things
qgallouedec Dec 13, 2024
65f31f6
Fix chat script path [ci skip]
qgallouedec Dec 13, 2024
3a3be53
style
qgallouedec Dec 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,3 @@ checklink/cookies.txt
nbs/wandb/
examples/notebooks/wandb/
wandb/

# cli scripts that are symlinked from `examples/scripts`
trl/commands/scripts/
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ For something slightly more challenging, you can also take a look at the [Good S
Before you start contributing make sure you have installed all the dev tools:

```bash
make dev
pip install -e .[dev]
```

## Fixing outstanding issues
Expand Down Expand Up @@ -152,7 +152,7 @@ Follow these steps to start contributing:
4. Set up a development environment by running the following command in a conda or a virtual environment you've created for working on this library:

```bash
$ make dev
$ pip install -e .[dev]
```

(If TRL was already installed in the virtual environment, remove
Expand Down
7 changes: 0 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,6 @@ check_dirs := examples tests trl
ACCELERATE_CONFIG_PATH = `pwd`/examples/accelerate_configs
COMMAND_FILES_PATH = `pwd`/commands


dev:
@if [ -L "$(pwd)/trl/commands/scripts" ]; then unlink "$(pwd)/trl/commands/scripts"; fi
@if [ -e "$(pwd)/trl/commands/scripts" ] && [ ! -L "$(pwd)/trl/commands/scripts" ]; then rm -rf "$(pwd)/trl/commands/scripts"; fi
pip install -e ".[dev]"
ln -s `pwd`/examples/scripts/ `pwd`/trl/commands

test:
python -m pytest -n auto --dist=loadfile -s -v --reruns 5 --reruns-delay 1 --only-rerun '(OSError|Timeout|HTTPError.*502|HTTPError.*504||not less than or equal to 0.01)' ./tests/

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ If you want to contribute to `trl` or customize it to your needs make sure to re
```bash
git clone https://github.com/huggingface/trl.git
cd trl/
make dev
pip install -e .[dev]
```

## Citation
Expand Down
2 changes: 1 addition & 1 deletion commands/run_dpo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ CMD="""
accelerate launch $EXTRA_ACCELERATE_ARGS \
--num_processes $NUM_GPUS \
--mixed_precision 'fp16' \
`pwd`/examples/scripts/dpo.py \
`pwd`/trl/scripts/dpo.py \
--model_name_or_path $MODEL_NAME \
--dataset_name $DATASET_NAME \
--output_dir $OUTPUT_DIR \
Expand Down
2 changes: 1 addition & 1 deletion commands/run_sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ CMD="""
accelerate launch $EXTRA_ACCELERATE_ARGS \
--num_processes $NUM_GPUS \
--mixed_precision 'fp16' \
`pwd`/examples/scripts/sft.py \
`pwd`/trl/scripts/sft.py \
--model_name $MODEL_NAME \
--dataset_name $DATASET_NAME \
--output_dir $OUTPUT_DIR \
Expand Down
16 changes: 10 additions & 6 deletions docs/source/clis.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,14 @@ You can use TRL to fine-tune your Language Model with Supervised Fine-Tuning (SF

Currently supported CLIs are:

- `trl sft`: fine-tune a LLM on a text/instruction dataset
- `trl dpo`: fine-tune a LLM with DPO on a preference dataset
#### Training commands

- `trl dpo`: fine-tune a LLM with DPO
- `trl kto`: fine-tune a LLM with KTO
- `trl sft`: fine-tune a LLM with SFT

#### Other commands

- `trl chat`: quickly spin up a LLM fine-tuned for chatting
- `trl env`: get the system information

Expand Down Expand Up @@ -58,7 +64,7 @@ Follow the basic instructions above and run `trl sft --output_dir <output_dir> <
trl sft --model_name_or_path facebook/opt-125m --dataset_name stanfordnlp/imdb --output_dir opt-sft-imdb
```

The SFT CLI is based on the `examples/scripts/sft.py` script.
The SFT CLI is based on the `trl/scripts/sft.py` script.

### Direct Policy Optimization (DPO)

Expand All @@ -81,7 +87,7 @@ trl dpo --model_name_or_path facebook/opt-125m --output_dir trl-hh-rlhf --datase
```


The DPO CLI is based on the `examples/scripts/dpo.py` script.
The DPO CLI is based on the `trl/scripts/dpo.py` script.


#### Custom preference dataset
Expand Down Expand Up @@ -117,8 +123,6 @@ Besides talking to the model there are a few commands you can use:
- `save` or `save {SAVE_NAME}`: save the current chat and settings to file by default to `./chat_history/{MODEL_NAME}/chat_{DATETIME}.yaml` or `{SAVE_NAME}` if provided
- `exit`: closes the interface

The default examples are defined in `examples/scripts/config/default_chat_config.yaml` but you can pass your own with `--config CONFIG_FILE` where you can also specify the default generation parameters.

## Getting the system information

You can get the system information by running the following command:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/dpo_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -112,12 +112,12 @@ For a complete example of fine-tuning a vision-language model, refer to the scri

## Example script

We provide an example script to train a model using the DPO method. The script is available in [`examples/scripts/dpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py)
We provide an example script to train a model using the DPO method. The script is available in [`trl/scripts/dpo.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/dpo.py)

To test the DPO script with the [Qwen2 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the [UltraFeedback dataset](https://huggingface.co/datasets/trl-lib/ultrafeedback_binarized), run the following command:

```bash
accelerate launch examples/scripts/dpo.py \
accelerate launch trl/scripts/dpo.py \
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
--dataset_name trl-lib/ultrafeedback_binarized \
--num_train_epochs 1 \
Expand Down
6 changes: 1 addition & 5 deletions docs/source/example_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,23 +31,19 @@ Then, it is encouraged to launch jobs with `accelerate launch`!

# Maintained Examples


Scripts can be used as examples of how to use TRL trainers. They are located in the [`trl/scripts`](https://github.com/huggingface/trl/blob/main/trl/scripts) directory. Additionally, we provide examples in the [`examples/scripts`](https://github.com/huggingface/trl/blob/main/examples/scripts) directory. These examples are maintained and tested regularly.

| File | Description |
| ----------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`examples/scripts/alignprop.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/alignprop.py) | This script shows how to use the [`AlignPropTrainer`] to fine-tune a diffusion model. |
| [`examples/scripts/bco.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/bco.py) | This script shows how to use the [`KTOTrainer`] with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty and helpfulness using the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset. |
| [`examples/scripts/chat.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/chat.py) | This script allows you to load and use a model as a chatbot. |
| [`examples/scripts/cpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/cpo.py) | This script shows how to use the [`CPOTrainer`] to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. |
| [`examples/scripts/ddpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ddpo.py) | This script shows how to use the [`DDPOTrainer`] to fine-tune a stable diffusion model using reinforcement learning. |
| [`examples/scripts/dpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo_vlm.py) | This script shows how to use the [`DPOTrainer`] to fine-tune a Vision Language Model to reduce hallucinations using the [openbmb/RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) dataset. |
| [`examples/scripts/dpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) | This script shows how to use the [`DPOTrainer`] to fine-tune a stable to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. |
| [`examples/scripts/kto.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/kto.py) | This script shows how to use the [`KTOTrainer`] to fine-tune a model. |
| [`examples/scripts/orpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py) | This script shows how to use the [`ORPOTrainer`] to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. |
| [`examples/scripts/ppo/ppo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo.py) | This script shows how to use the [`PPOTrainer`] to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language |
| [`examples/scripts/ppo/ppo_tldr.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo_tldr.py) | This script shows how to use the [`PPOTrainer`] to fine-tune a model to improve its ability to generate TL;DR summaries. |
| [`examples/scripts/reward_modeling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py) | This script shows how to use the [`RewardTrainer`] to train a reward model on your own dataset. |
| [`examples/scripts/sft.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a model or adapters into a target dataset. |
| [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models so users may see unexpected behaviour in other model architectures. |

Here are also some easier-to-run colab notebooks that you can use to get started with TRL:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/kto_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,12 @@ In theory, the dataset should contain at least one chosen and one rejected compl

## Example script

We provide an example script to train a model using the KTO method. The script is available in [`examples/scripts/kto.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/kto.py)
We provide an example script to train a model using the KTO method. The script is available in [`trl/scripts/kto.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/kto.py)

To test the KTO script with the [Qwen2 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the [UltraFeedback dataset](https://huggingface.co/datasets/trl-lib/kto-mix-14k), run the following command:

```bash
accelerate launch examples/scripts/kto.py \
accelerate launch trl/scripts/kto.py \
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
--dataset_name trl-lib/kto-mix-14k \
--num_train_epochs 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/lora_tuning_peft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -140,5 +140,5 @@ python PATH_TO_SCRIPT
You can easily fine-tune Llama2 model using `SFTTrainer` and the official script! For example to fine-tune llama2-7b on the Guanaco dataset, run (tested on a single NVIDIA T4-16GB):

```bash
python examples/scripts/sft.py --output_dir sft_openassistant-guanaco --model_name meta-llama/Llama-2-7b-hf --dataset_name timdettmers/openassistant-guanaco --load_in_4bit --use_peft --per_device_train_batch_size 4 --gradient_accumulation_steps 2
python trl/scripts/sft.py --output_dir sft_openassistant-guanaco --model_name meta-llama/Llama-2-7b-hf --dataset_name timdettmers/openassistant-guanaco --load_in_4bit --use_peft --per_device_train_batch_size 4 --gradient_accumulation_steps 2
```
2 changes: 1 addition & 1 deletion docs/source/sft_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

Check out a complete flexible example at [`examples/scripts/sft.py`](https://github.com/huggingface/trl/tree/main/examples/scripts/sft.py).
Check out a complete flexible example at [`trl/scripts/sft.py`](https://github.com/huggingface/trl/tree/main/trl/scripts/sft.py).
Experimental support for Vision Language Models is also included in the example [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/tree/main/examples/scripts/sft_vlm.py).

## Quickstart
Expand Down
Loading
Loading