v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration
Highlights
- Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
- In-depth example showcasing
trlx
usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh - Improved ILQL modeling integration with Hugging Face
transformers
. Users can now work withAutoModelForCausalLMWithILQLHeads
objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.
What's Changed
- Add
wandb
group naming by @jon-tow in #188 - Update
reward_fn
signatures in examples by @jon-tow in #190 - Add tokenizer config by @reciprocated in #189
- Fix extraction of
mixed_precision
option for deepspeed by @reciprocated in #197 - Fix
summarize_rlhf
inference checkpoint paths by @jon-tow in #194 - Make the config loading consistent across all example scripts. by @shermansiu in #192
- Make
Trainer.save_pretrained
sub-directory optional by @jon-tow in #201 - Update Readme to include T5 models by @aaronrmm in #198
- Make
make_head
accept dtype parameter by @reciprocated in #213 - Enable training with Tensorboard tracking by @marcobellagente93 in #209
- Support nested updates in
merge
by @cat-state in #219 - Fix typo reward normalize summarize by @PhungVanDuy in #221
- Update stale comment from results table by @jon-tow in #222
- Fix undefined trackers property by @alan-cooney in #224
- Fix tokenizer missing form config.to_dict() by @alan-cooney in #228
- Make experiment tracking optional by @jon-tow in #226
- read tokenizer path from config correctly by @JustinAWei in #230
- Add devcontainer support by @alan-cooney in #196
- fix: change lora_a:float to lora_r:int by @aaronrmm in #235
- Bump
isort
to hotfix CI code quality workflow by @jon-tow in #237 - Fix optional tracking in
accelerator.log
by @jon-tow in #233 - Improve documentation/comments on the random walk example by @alan-cooney in #208
- Update link to "Learning to Summarize from Human Feedback" by @jon-tow in #241
- Fix deepspeed state saving under
save_best
condition by @reciprocated in #242 - added colab notebook by @smellslikeml in #244
- [style] Increase black's line length by @reciprocated in #250
- Add help string to get_advantages_and_returns by @pesvut in #225
- Filter out empty responses by @reciprocated in #265
- NeMo Integrate by @cat-state in #125
- Add multi-process logger utility for status monitoring by @jon-tow in #254
- Add
NeMo
support info toREADME
by @jon-tow in #275 - Fix distributed dataloaders & deduplicate eval by @reciprocated in #276
- Improve PPO readability by @alan-cooney in #210
- Add T5 to delta modifier map by @aaronrmm in #234
- [fix] Set deepspeed's fp16
auto_cast
to false by @reciprocated in #279 - Rename remaining
logprobs_from_logits
call by @jon-tow in #281 - [feat] Add Accelerate SFT Trainer by @reciprocated in #280
- Add Colab Notebook for Sentiment by @zswitten in #285
- Remove
pylance
installs from devcontainer by @jon-tow in #296 - Move notebooks to examples dir by @jon-tow in #294
- [fix] Summarize config discrepancy by @reciprocated in #293
- Make Git check optional by @cat-state in #299
- refactor: remove orchestrator abstraction from API by @jon-tow in #289
- Set
add_special_tokens=False
to not add EOS unexpectedly by @cat-state in #287 - [feat] Gather experience samples by @reciprocated in #305
- [fix] Make
gather_for_metrics
usage more strict by @reciprocated in #315 - Add helpful and harmless example by @reciprocated in #128
- Adopt
PreTrainedModelWrapper
for Hugging Face models by @jon-tow in #215
New Contributors
- @shermansiu made their first contribution in #192
- @aaronrmm made their first contribution in #198
- @marcobellagente93 made their first contribution in #209
- @alan-cooney made their first contribution in #224
- @JustinAWei made their first contribution in #230
- @smellslikeml made their first contribution in #244
- @pesvut made their first contribution in #225
- @zswitten made their first contribution in #285
Full Changelog: v0.4...v0.5.0