Release GPTQModel v1.3.1 · ModelCloud/GPTQModel

What's Changed

⚡ Olmo2 model support.
⚡ Intel XPU acceleration via IPEX.
Sharding compat fix due to api deprecation in HF Transformers.
Removed triton dependency. Triton kernel now optionally dependent on triton pkg.
Fixed Hymba Test (Hymba requires desc_act=False)

[FIX] use split_torch_state_dict_into_shards to replace shard_checkpoint by @LRL-ModelCloud in #682
[Model] add olmo2 support by @LRL-ModelCloud in #678
[FIX] Hymba currently only supports a batch size of 1 by @ZX-ModelCloud in #683
[CI] fix extensions is not defined by @CSY-ModelCloud in #684
Ipex XPU support by @jiqing-feng in #608
[FIX] add require_pkgs_version and checks by @ZX-ModelCloud in #693
fix ipex test by @Qubitium in #691
[FIX] remove require_transformers_version and require_tokenizers_version by @ZX-ModelCloud in #695
Remove use_safetensors argument by @ZX-ModelCloud in #696
Revert exllamav1 by @CSY-ModelCloud in #692
Make Triton optional by @CSY-ModelCloud in #697
Unify backend use by @LRL-ModelCloud in #700
[FIX] fix test_hymba by @ZX-ModelCloud in #704
FIX IPEX XPU selection by @Qubitium in #705
fix cpu/xpu backend selection by @jiqing-feng in #706
Upgrade device-smi depend by @Qubitium in #708
[FIX] hymba quant needs desc_act=False by @ZX-ModelCloud in #710

Full Changelog: v1.3.0...v1.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v1.3.1

What's Changed

Contributors