Skip to content

v0.1.0

Compare
Choose a tag to compare
@mobicham mobicham released this 05 Dec 15:11
· 284 commits to master since this release

HQQ v0.1.0

Improvements

  • Added compile backend support
  • Added Aten C++ backend (experimental)
  • Faster bit unpacking via pre-allocated empty tensor
  • Added VLLM support
  • Refactoring to call quantize_model() on instances

Supported models

  • Llama (Hugging Face + VLLM)
  • ViT-CLIP (timm)

Limitations

  • HF only supports single GPU runtime.
  • VLLM only supports single GPU with a single worker.
  • The compile backend sometimes creates issues with async runtime
  • Doesn't support PEFT (LoRA, etc.).