-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weight support for LigerCrossEntropy #420
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this! Had a few minor suggestions.
Another TODO is based on the original paper linked in the original issue for this feature. We also need to support a sample level weight. i.e. a weight that can be applied to each element of the batch if we have logits in the shape (B, S, V). We'd have sample level weights of shape (B, ). This is what's proposed in the C-RLFT paper. https://arxiv.org/abs/2309.11235
Feel free to push to this branch or even take over it and open a new PR, I won't be able to update that often in the next few months. Just trying to make the first step when I got time. |
(1.0, torch.float32, 1e-8, 1e-6), | ||
], | ||
) | ||
def test_correctness_with_weight_with_other_params_once( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test couldn't pass somehow. I might miss something.
Gotcha! I'll try wrapping it up, you've done most of the heavy lifting already. |
Summary
Resolve #404.
TODO:
weight
paramter at LigerFusedLinearCrossEntropyLoss, but we need to consider renaming some variables to distinguish weight of linear layer and weight for ce.weight
Testing Done
It hasn't fully tested with other params.
make test
to ensure correctnessmake checkstyle
to ensure code stylemake test-convergence
to ensure convergence