From 5e319aae58f154af06bf9d9d125b138ed79d2450 Mon Sep 17 00:00:00 2001
From: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Date: Mon, 4 Mar 2024 12:02:27 +0100
Subject: [PATCH] Fix documentation (#583)
* Fix documentation
* fix
---
docs/source/inference.mdx | 7 +++++--
docs/source/optimization_ov.mdx | 6 +++++-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx
index a9ee5529da..905e0aa4dd 100644
--- a/docs/source/inference.mdx
+++ b/docs/source/inference.mdx
@@ -110,7 +110,7 @@ By default the quantization scheme will be [assymmetric](https://github.com/open
For INT4 quantization you can also specify the following arguments :
* The `--group-size` parameter will define the group size to use for quantization, `-1` it will results in per-column quantization.
-* The `--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantization. If set to 0.9, it means that 90% of the layers will be quantized to `int4` while 10% will be quantized to `int8`.
+* The `--ratio` parameter controls the ratio between 4-bit and 8-bit quantization. If set to 0.9, it means that 90% of the layers will be quantized to `int4` while 10% will be quantized to `int8`.
Smaller `group_size` and `ratio` of usually improve accuracy at the sacrifice of the model size and inference latency.
@@ -122,8 +122,11 @@ from optimum.intel import OVModelForCausalLM
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
```
-> **NOTE:** `load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
+
+`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
+
+
To apply quantization on both weights and activations, you can use the `OVQuantizer`, more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#optimization).
diff --git a/docs/source/optimization_ov.mdx b/docs/source/optimization_ov.mdx
index 5686af4bf3..51067b0b64 100644
--- a/docs/source/optimization_ov.mdx
+++ b/docs/source/optimization_ov.mdx
@@ -69,7 +69,11 @@ from optimum.intel import OVModelForCausalLM
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
```
-> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.
+
+
+`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
+
+
For the 4-bit weight quantization you can use the `quantization_config` to specify the optimization parameters, for example: