What's Changed
⚡ Olmo2 model support.
⚡ Intel XPU acceleration via IPEX.
Sharding compat fix due to api deprecation in HF Transformers.
Removed triton dependency. Triton kernel now optionally dependent on triton pkg.
Fixed Hymba Test (Hymba requires desc_act=False)
- [FIX] use split_torch_state_dict_into_shards to replace shard_checkpoint by @LRL-ModelCloud in #682
- [Model] add olmo2 support by @LRL-ModelCloud in #678
- [FIX] Hymba currently only supports a batch size of 1 by @ZX-ModelCloud in #683
- [CI] fix extensions is not defined by @CSY-ModelCloud in #684
- Ipex XPU support by @jiqing-feng in #608
- [FIX] add require_pkgs_version and checks by @ZX-ModelCloud in #693
- fix ipex test by @Qubitium in #691
- [FIX] remove require_transformers_version and require_tokenizers_version by @ZX-ModelCloud in #695
- Remove use_safetensors argument by @ZX-ModelCloud in #696
- Revert exllamav1 by @CSY-ModelCloud in #692
- Make Triton optional by @CSY-ModelCloud in #697
- Unify backend use by @LRL-ModelCloud in #700
- [FIX] fix test_hymba by @ZX-ModelCloud in #704
- FIX IPEX XPU selection by @Qubitium in #705
- fix cpu/xpu backend selection by @jiqing-feng in #706
- Upgrade device-smi depend by @Qubitium in #708
- [FIX] hymba quant needs desc_act=False by @ZX-ModelCloud in #710
Full Changelog: v1.3.0...v1.3.1