generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update incorrect data processing in DataCollatorForChatML (#2172)
* Update incorrect data processing in DataCollatorForChatML Fix the extra BOS token and the absence of an EOS token in the returned input_ids, and potentially the absence of a target string in the returned labels. * Update trl/trainer/utils.py Co-authored-by: lewtun <[email protected]> * style * move comment * add test for DataCollatorForChatML * update comment with more details * update assert reports and comments, and adds verification that the last token of input_ids should be EOS token * new line at the end of file for code quality * Update tests/test_utils.py * Update tests/test_utils.py * Update tests/test_utils.py * update tests * fix test * Update tests/test_utils.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update tests/test_utils.py Co-authored-by: Quentin Gallouédec <[email protected]> * formatting * fix typo * simplify * Revert "simplify" This reverts commit 7e4006c. * tokenize full messages * dont add eos * eos is in the last token * simplify DataCollatorForChatML * Update tests/test_utils.py Co-authored-by: Quentin Gallouédec <[email protected]> --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
- Loading branch information
1 parent
4197916
commit 3107a40
Showing
2 changed files
with
133 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters