Skip to content

Commit

Permalink
Use GPTQ-Marlin for supported GPTQ configurations (#2111)
Browse files Browse the repository at this point in the history
GPTQ-Marlin is currently the best-performing kernel for GPTQ models. So
let's use it by default if the kernels are installed, the GPU supports
it, and the kernels support the configuration.

For models generated by `text-generation-server quantize`, use
`sym=False`. This subcommand symmetric quantization since the beginning
and incorrectly reporting the model to be symmetric will use
GPTQ-Marlin (which does not support asymmetric quantization).
  • Loading branch information
danieldk authored Jul 1, 2024
1 parent 0d97a93 commit 2ce8019
Show file tree
Hide file tree
Showing 8 changed files with 141 additions and 719 deletions.

This file was deleted.

This file was deleted.

Loading

0 comments on commit 2ce8019

Please sign in to comment.