Use GPTQ-Marlin for supported GPTQ configurations #2111

danieldk · 2024-06-24T13:22:51Z

What does this PR do?

GPTQ-Marlin is currently the best-performing kernel for GPTQ models. So let's use it by default if the kernels are installed, the GPU supports it, and the kernels support the configuration.

For models generated by text-generation-server quantize, use sym=False. This subcommand symmetric quantization since the beginning and incorrectly reporting the model to be symmetric will use GPTQ-Marlin (which does not support asymmetric quantization).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

GPTQ-Marlin is currently the best-performing kernel for GPTQ models. So let's use it by default if the kernels are installed, the GPU supports it, and the kernels support the configuration. For models generated by `text-generation-server quantize`, use `sym=False`. This subcommand symmetric quantization since the beginning and incorrectly reporting the model to be symmetric will use GPTQ-Marlin (which does not support asymmetric quantization).

Narsil

LGTM !

GPTQ-Marlin is currently the best-performing kernel for GPTQ models. So let's use it by default if the kernels are installed, the GPU supports it, and the kernels support the configuration. For models generated by `text-generation-server quantize`, use `sym=False`. This subcommand symmetric quantization since the beginning and incorrectly reporting the model to be symmetric will use GPTQ-Marlin (which does not support asymmetric quantization).

danieldk force-pushed the feature/use-gptq-marlin-for-gptq branch 3 times, most recently from 7cbe33b to de4e0c4 Compare June 25, 2024 10:14

danieldk force-pushed the feature/use-gptq-marlin-for-gptq branch from de4e0c4 to 2e763d1 Compare June 27, 2024 07:53

danieldk marked this pull request as ready for review June 27, 2024 08:56

Narsil approved these changes Jul 1, 2024

View reviewed changes

Narsil merged commit 2ce8019 into main Jul 1, 2024
9 checks passed

Narsil deleted the feature/use-gptq-marlin-for-gptq branch July 1, 2024 10:59

flozi00 mentioned this pull request Jul 2, 2024

[RFC]Add Auto-Round Support #2130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use GPTQ-Marlin for supported GPTQ configurations #2111

Use GPTQ-Marlin for supported GPTQ configurations #2111

danieldk commented Jun 24, 2024 •

edited

Loading

Narsil left a comment

Use GPTQ-Marlin for supported GPTQ configurations #2111

Use GPTQ-Marlin for supported GPTQ configurations #2111

Conversation

danieldk commented Jun 24, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Narsil left a comment

Choose a reason for hiding this comment

danieldk commented Jun 24, 2024 •

edited

Loading