Update docs/source/llm_quantization/usage_guides/quantization.mdx

Co-authored-by: fxmarty <[email protected]>
huggingface · Oct 18, 2023 · 714fece · 714fece
1 parent 7f8962d
commit 714fece
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/source/llm_quantization/usage_guides/quantization.mdx b/docs/source/llm_quantization/usage_guides/quantization.mdx
@@ -76,7 +76,7 @@ quantized_model = load_quantized_model(empty_model, save_folder=save_folder, dev
 
 ### Exllama kernels for faster inference
 
-For 4-bit model, you can use the exllama kernels in order to a faster inference speed. If you want to change its value, you just need to pass `disable_exllama` in [`~optimum.gptq.load_quantized_model`]. In order to use these kernels, you need to have the entire model on gpus.
+For 4-bit model, you can use the exllama kernels in order to have a faster inference speed. If you want to change its value, you just need to pass `disable_exllama` in [`~optimum.gptq.load_quantized_model`]. In order to use these kernels, you need to have the entire model on gpus.
 
 ```py
 from optimum.gptq import GPTQQuantizer, load_quantized_model