Skip to content

Commit

Permalink
Update docs/source/inference.mdx
Browse files Browse the repository at this point in the history
Co-authored-by: Helena Kloosterman <[email protected]>
  • Loading branch information
echarlaix and helena-intel authored Mar 13, 2024
1 parent 027c370 commit afc23d0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/inference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ tokenizer.save_pretrained(save_directory)

### Weight-only quantization

You can also apply fp16, 8-bit or 4-bit weight quantization on the linear and embedding layers when exporting your model with the CLI by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:
You can also apply fp16, 8-bit or 4-bit weight compression on the linear and embedding layers when exporting your model with the CLI by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:

```bash
optimum-cli export openvino --model gpt2 --weight-format int8 ov_model
Expand Down

0 comments on commit afc23d0

Please sign in to comment.