Skip to content

Commit

Permalink
bitblas Readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mobicham committed Jul 11, 2024
1 parent b16c018 commit 2a98fda
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@ prepare_for_inference(model, backend="torchao_int4")

#Marlin backend: nbits=4, axis=1, compute_dtype=float16, group_size=None
#prepare_for_inference(model, backend="marlin", allow_merge=True)

#Bitblas backend: nbits=4/2/1, axis=1, compute_dtype=float16, group_size=None
#prepare_for_inference(model, backend="bitblas")
```
These backends only work with 4-bit quantization and `axis=1`. Additionally, for <a href="https://github.com/IST-DASLab/marlin.git">Marlin</a>, we only support `group_size=None`. Below you can find a comparison between the different backends. The torchao kernel reaches 195 tokens/sec (generation speed) on a 4090.

Expand Down

0 comments on commit 2a98fda

Please sign in to comment.