Skip to content

Commit

Permalink
Update docs/source/llm_quantization/usage_guides/quantization.mdx
Browse files Browse the repository at this point in the history
Co-authored-by: fxmarty <[email protected]>
  • Loading branch information
SunMarc and fxmarty authored Oct 18, 2023
1 parent 7f8962d commit 714fece
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/llm_quantization/usage_guides/quantization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ quantized_model = load_quantized_model(empty_model, save_folder=save_folder, dev

### Exllama kernels for faster inference

For 4-bit model, you can use the exllama kernels in order to a faster inference speed. If you want to change its value, you just need to pass `disable_exllama` in [`~optimum.gptq.load_quantized_model`]. In order to use these kernels, you need to have the entire model on gpus.
For 4-bit model, you can use the exllama kernels in order to have a faster inference speed. If you want to change its value, you just need to pass `disable_exllama` in [`~optimum.gptq.load_quantized_model`]. In order to use these kernels, you need to have the entire model on gpus.

```py
from optimum.gptq import GPTQQuantizer, load_quantized_model
Expand Down

0 comments on commit 714fece

Please sign in to comment.