Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix misleading variable "epoch" from the training loop from PPOTraine…
…r Doc. (#1171) * Fix misleading variable "epoch" from PPOTrainer Doc. The usage of the variable “epoch” is misleading in the original Doc, the dataloader does not contain the data for ALL epochs, but 1 only, thus "for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #. The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs. I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531 * Remove batch_id
- Loading branch information