v0.4.1: Gemma 2 Support, CrossEntropy Patching FIx, and GroupNorm
Highlights
-
Gemma 2 Support: The long pending gemma 2 is finally supported thanks to @Tcc0403! He has implemented the nasty softcapping in fused linear cross entropy (#320) and discovered the convergence issue which later fixed by @ByronHsu and @Tcc0403 together. (#376)
-
CrossEntropy Patching FIx: If you use monkey patch for
CrossEntropy
(Not FLCE), it is actually not patched after transformers4.46.1
. This is becauseCrossEntropy
was replaced withF.cross_entropy
in the model code. We fixed the issue in the PR (#375) -
GroupNorm Kernel: Our new contributor @pramodith implemented a GroupNorm kernel #375 with 2x Speedup.
What's Changed
- BUG: Fix bug in layer norm tests. by @pramodith in #359
- Support Z Loss in CE by @Tcc0403 in #239
- Improve compatibility to access the base models by @why-in-Shanghaitech in #340
- poke test again by @ByronHsu in #360
- Kernels for GroupNorm by @pramodith in #353
- Remove trailing newline. by @ckckjw in #364
- Fix typo in the description of FusedLinearJSD by @Tcc0403 in #366
- Updates Readme to add GroupNorm by @pramodith in #365
- Support FusedLinearCrossEntropy for Gemma2 by @Tcc0403 in #320
- Rotate modal and pypi tokens by @ByronHsu in #372
- Fix release password by @ByronHsu in #373
- Support CE after grad acc fix by @ByronHsu in #375
- Support out-of-place RMSNorm to fix gemma2 by @ByronHsu in #376
New Contributors
- @pramodith made their first contribution in #359
- @why-in-Shanghaitech made their first contribution in #340
- @ckckjw made their first contribution in #364
Full Changelog: v0.4.0...v0.4.1