Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weight support for LigerCrossEntropy #420

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

Tcc0403
Copy link
Collaborator

@Tcc0403 Tcc0403 commented Dec 2, 2024

Summary

Resolve #404.

TODO:

  • (RFC) Expose weight paramter at LigerFusedLinearCrossEntropyLoss, but we need to consider renaming some variables to distinguish weight of linear layer and weight for ce.
  • Add unit test for FLCE after exposing weight

Testing Done

It hasn't fully tested with other params.

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Copy link
Collaborator

@pramodith pramodith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this! Had a few minor suggestions.

Another TODO is based on the original paper linked in the original issue for this feature. We also need to support a sample level weight. i.e. a weight that can be applied to each element of the batch if we have logits in the shape (B, S, V). We'd have sample level weights of shape (B, ). This is what's proposed in the C-RLFT paper. https://arxiv.org/abs/2309.11235

src/liger_kernel/ops/cross_entropy.py Outdated Show resolved Hide resolved
src/liger_kernel/ops/cross_entropy.py Show resolved Hide resolved
src/liger_kernel/ops/cross_entropy.py Show resolved Hide resolved
src/liger_kernel/ops/cross_entropy.py Outdated Show resolved Hide resolved
src/liger_kernel/ops/cross_entropy.py Outdated Show resolved Hide resolved
test/transformers/test_cross_entropy.py Outdated Show resolved Hide resolved
@Tcc0403
Copy link
Collaborator Author

Tcc0403 commented Dec 2, 2024

Feel free to push to this branch or even take over it and open a new PR, I won't be able to update that often in the next few months. Just trying to make the first step when I got time.

(1.0, torch.float32, 1e-8, 1e-6),
],
)
def test_correctness_with_weight_with_other_params_once(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test couldn't pass somehow. I might miss something.

@pramodith
Copy link
Collaborator

Feel free to push to this branch or even take over it and open a new PR, I won't be able to update that often in the next few months. Just trying to make the first step when I got time.

Gotcha! I'll try wrapping it up, you've done most of the heavy lifting already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weighted Cross Entropy Loss
2 participants