Release v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration · CarperAI/trlx

Highlights

Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
In-depth example showcasing trlx usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh
Improved ILQL modeling integration with Hugging Face transformers. Users can now work with AutoModelForCausalLMWithILQLHeads objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.

What's Changed

Add wandb group naming by @jon-tow in #188
Update reward_fn signatures in examples by @jon-tow in #190
Add tokenizer config by @reciprocated in #189
Fix extraction of mixed_precision option for deepspeed by @reciprocated in #197
Fix summarize_rlhf inference checkpoint paths by @jon-tow in #194
Make the config loading consistent across all example scripts. by @shermansiu in #192
Make Trainer.save_pretrained sub-directory optional by @jon-tow in #201
Update Readme to include T5 models by @aaronrmm in #198
Make make_head accept dtype parameter by @reciprocated in #213
Enable training with Tensorboard tracking by @marcobellagente93 in #209
Support nested updates in merge by @cat-state in #219
Fix typo reward normalize summarize by @PhungVanDuy in #221
Update stale comment from results table by @jon-tow in #222
Fix undefined trackers property by @alan-cooney in #224
Fix tokenizer missing form config.to_dict() by @alan-cooney in #228
Make experiment tracking optional by @jon-tow in #226
read tokenizer path from config correctly by @JustinAWei in #230
Add devcontainer support by @alan-cooney in #196
fix: change lora_a:float to lora_r:int by @aaronrmm in #235
Bump isort to hotfix CI code quality workflow by @jon-tow in #237
Fix optional tracking in accelerator.log by @jon-tow in #233
Improve documentation/comments on the random walk example by @alan-cooney in #208
Update link to "Learning to Summarize from Human Feedback" by @jon-tow in #241
Fix deepspeed state saving under save_best condition by @reciprocated in #242
added colab notebook by @smellslikeml in #244
[style] Increase black's line length by @reciprocated in #250
Add help string to get_advantages_and_returns by @pesvut in #225
Filter out empty responses by @reciprocated in #265
NeMo Integrate by @cat-state in #125
Add multi-process logger utility for status monitoring by @jon-tow in #254
Add NeMo support info to README by @jon-tow in #275
Fix distributed dataloaders & deduplicate eval by @reciprocated in #276
Improve PPO readability by @alan-cooney in #210
Add T5 to delta modifier map by @aaronrmm in #234
[fix] Set deepspeed's fp16 auto_cast to false by @reciprocated in #279
Rename remaining logprobs_from_logits call by @jon-tow in #281
[feat] Add Accelerate SFT Trainer by @reciprocated in #280
Add Colab Notebook for Sentiment by @zswitten in #285
Remove pylance installs from devcontainer by @jon-tow in #296
Move notebooks to examples dir by @jon-tow in #294
[fix] Summarize config discrepancy by @reciprocated in #293
Make Git check optional by @cat-state in #299
refactor: remove orchestrator abstraction from API by @jon-tow in #289
Set add_special_tokens=False to not add EOS unexpectedly by @cat-state in #287
[feat] Gather experience samples by @reciprocated in #305
[fix] Make gather_for_metrics usage more strict by @reciprocated in #315
Add helpful and harmless example by @reciprocated in #128
Adopt PreTrainedModelWrapper for Hugging Face models by @jon-tow in #215

New Contributors

@shermansiu made their first contribution in #192
@aaronrmm made their first contribution in #198
@marcobellagente93 made their first contribution in #209
@alan-cooney made their first contribution in #224
@JustinAWei made their first contribution in #230
@smellslikeml made their first contribution in #244
@pesvut made their first contribution in #225
@zswitten made their first contribution in #285

Full Changelog: v0.4...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

Highlights

What's Changed

New Contributors

Contributors