v0.1.3
HQQ v0.1.3
New features
- Added CUDA kernels for dequantization (up to 2-3x inference speed-up vs. Pytorch)
- Added support for
compute_dtype
parameter (useful for float32/bfloat16 LoRA training)
compute_dtype
parameter (useful for float32/bfloat16 LoRA training)