Remove accelerate and onnxruntime from required dependencies #590

echarlaix · 2024-03-07T09:49:12Z

No description provided.

HuggingFaceDocBuilderDev · 2024-03-07T09:54:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

helena-intel

Thanks @echarlaix ! Somewhat related: what do you think about moving onnx to INSTALL_REQUIRE? It is required for INC, IPEX and OpenVINO, so it might as well be in the main dependencies? It would help user experience a bit, because pip will show ONNX as a dependency of optimum-intel, and will check compatibility if there are ever version limits, and it makes it easier to install optimum-intel in an existing OpenVINO environment for advanced users (we should still recommend optimum[openvino] to users, but it would make it possible to skip the extra).

echarlaix · 2024-03-07T14:40:36Z

Thanks @echarlaix ! Somewhat related: what do you think about moving onnx to INSTALL_REQUIRE? It is required for INC, IPEX and OpenVINO, so it might as well be in the main dependencies? It would help user experience a bit, because pip will show ONNX as a dependency of OpenVINO, and will check compatibility if there are ever version limits, and it makes it easier to install optimum-intel in an existing OpenVINO environment for advanced users (we should still recommend optimum[openvino] to users, but it would make it possible to skip the extra).

Makes sense! Added it in 0b50ed5

AlexKoff88 · 2024-03-08T06:55:46Z

We use datasets in the data-aware weight quantization of LLMs. Shall we throw a meaningful exception in this case?

I also looked at the CI errors, we should update the references caused by the changes in OpenVINO and NNCF.

echarlaix · 2024-03-08T10:43:19Z

We use datasets in the data-aware weight quantization of LLMs. Shall we throw a meaningful exception in this case?

Currently if weights_only=True then _get_calibration_dataloader will not be called so modifications from this PR shouldn't have any impact (would definitely make sense to also enable the possibility to provide the calibration_dataset to the OVQuantizer for data-aware weight only quantization)

I also looked at the CI errors, we should update the references caused by the changes in OpenVINO and NNCF.

Added a fix in 94a990f

…face#590) * Remove accelerate dependency * Add accelerate to import backend mapping * Add eval method to OVModels * add onnxruntime install for OV test * fix test expected int8

…tension-for-transformers. (#455) * Support weight-only quantization with quantized operators in intel-extension-for-transformers * Update code style * Update readme for weight-only quantization example * Update code * Adapt intel-extension-for-transformers 1.3 API change Signed-off-by: Cheng, Penghui <[email protected]> * Support weight-only quantization with quantized operators in intel-extension-for-transformers * Update code * rebase code on main branch Signed-off-by: Cheng, Penghui <[email protected]> * Update example Signed-off-by: Cheng, Penghui <[email protected]> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <[email protected]> * [OV]: Fixed inference after 4 bit weight compression (#569) * [OV]: Fixed inferece after 4 bit weight compression * Fixed issue * Update optimum/intel/openvino/modeling_decoder.py Co-authored-by: Ella Charlaix <[email protected]> * Applied comments * Fixed issue when request is None --------- Co-authored-by: Ella Charlaix <[email protected]> * Updated docs with load_in_4bit (#558) * Updated docs with load_in_4bit * Update documentation * Update documentation * typo --------- Co-authored-by: Ella Charlaix <[email protected]> * Update Transformers dependency requirements (#571) * Fix compatibility for latest transformers release (#570) * fix compatibility for latest transformers release * update setup * update setup * fix test input size * fix prepare generation for llama models * Deprecate compression options (#565) * deprecate compression options * style * fix configuration * Update CLI argument * update documentation * deprecate torch nn modules for ov quantizer * fix ov config for fp32 models * fix format * update documentation * Add check for configuration * fix ratio default value for SD models * add quantization_config argument for OVModel * remove commented line * Update docs/source/inference.mdx Co-authored-by: Alexander Kozlov <[email protected]> * add default config for causal LM * fix warning message --------- Co-authored-by: Alexander Kozlov <[email protected]> * Add default quantization int4 config for Mixtral-8x7B (#576) * Update stable diffusion example requirements (#579) * Fix collecting duplicate tensors in quantization calibration dataset (#577) * Added deepcopying of inputs collected by InferRequestWrapper. Added a test covering the fixed issue. * Phrasing tweaks * Add soundfile to test requirements * Added librosa to test requirements * Added copying to other data cache appends * Remove the need for real test data * Process __call__ call properly * Addressed suggested changes * Save an openvino config summarizing all information related to quantization when saving model (#578) * fix doc * remove default compression value * set default compression config when not provided * save openvino config to include quantization configuration * fix style * add test * update setup * style * remove from quantization_config key from ov_config * add test * update setup * modify method name * Fix warning (#582) * Fix warning * fix message warning * Add reference to the temporary directory for windows fix (#581) * Fix documentation (#583) * Fix documentation * fix * Add llama test model to cover MQA (#585) * change llama test model to cover MQA * keep llama and llama2 in tests * fix code style * Include nncf in openvino extra (#586) * Fix title documentation (#588) * Update OpenVINO documentation links in README.md (#587) * Update OpenVINO documentation links in README.md The links are now aligned with OpenVINO 2024.0 documentation, and include permalinks instead of direct links, when possible. * Update inference.mdx * Update index.mdx * Update installation.mdx * Update README.md * Fix default int8 quantization for CLI (#592) * Change model output parameter to last_hidden_states for IPEXModel (#589) * change model output parameter to last_hidden_states * update ipex model testiong * update testing * add output name to ipex model * Add IPEX model patcher (#567) * llama model patcher * fix jit model * fix jit model * rm autocast in model * add llama model patcher * support assisted decoding and add reorder cache function * add comment for _prepare_past_key_values * rebase main * fix model_dtype * rm useless comments * fix llama * add comments for ipex_rope and ipex_scale_dot_product * fix comments * add enable_tpp comments * fix import * fix review aroun2 * add torch.no_grad to avoid auto_kernel_selection issue * use torch.no_grad in jit trace * fix ipex model testing * add tests for ipex model generation with multi inputs * fix code style * remove __get__(self) as _reorder_cache is static method for the class * fix reorder_cache * use model_type * check if reorder_cache is a static method * fix _reorder_cache * fix raise import error * test ipex patching * fix comments * update API name and testing * disable untill ipex version 2.5.0 * update testing name * Update optimum/intel/ipex/modeling_base.py Co-authored-by: Ella Charlaix <[email protected]> * Update tests/ipex/test_modeling.py Co-authored-by: Ella Charlaix <[email protected]> * fix tests --------- Co-authored-by: Ella Charlaix <[email protected]> * Updates weight quantization section in the docs (#593) * Remove accelerate and onnxruntime from required dependencies (#590) * Remove accelerate dependency * Add accelerate to import backend mapping * Add eval method to OVModels * add onnxruntime install for OV test * fix test expected int8 * Fix OpenVINO image classification examples (#598) * Fix weights compression for OPenVINO models (#596) * hot fix for weights compression * rewrite mcok tests * Fix default ov config (#600) * Add warning for transformers>=4.38 and OpenVINO 2024.0 (#599) * Add warning for transformers>=4.38 and OpenVINO 2024.0 * Use is_openvino_version to compare versions * Show version warning only for llama and gpt-bigcode * Fix style, show OpenVINO version * Include affected model types in warning message * Add hybrid quantization for StableDiffusion pipelines (#584) * Add hybrid quantization for StableDiffusion pipelines * apply black * fix tests * fix ruff * fix lcm bug * apply review comments * rework dataset processing * Add doc * remove SDXL test * Apply comments * reformat * Show device name in _print_compiled_model_properties (#541) * Show device name in _print_compiled_model_properties Enable CACHE_DIR also for devices like "GPU:0" * Update optimum/intel/openvino/modeling_seq2seq.py Co-authored-by: Ella Charlaix <[email protected]> * Change check for gpu device --------- Co-authored-by: Ella Charlaix <[email protected]> * Update code with comments Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pylint error Signed-off-by: Cheng, Penghui <[email protected]> * Update optimum/intel/neural_compressor/configuration.py Co-authored-by: Ella Charlaix <[email protected]> * Fixed example and UT for weight-only quantization Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pre-ci test error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pre-ci test error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed UT and examples error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed pre-CI error Signed-off-by: Cheng, Penghui <[email protected]> * Fixed UT error Signed-off-by: Cheng, Penghui <[email protected]> * Update tests/openvino/test_modeling_basic.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/README.md Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <[email protected]> * Load weight-only quantized model with INCModelForCausalLM Signed-off-by: Cheng, Penghui <[email protected]> * Changed parameters name for GPTQ in example Signed-off-by: Cheng, Penghui <[email protected]> * Changed parameters order in INCQuantizer.quantize Signed-off-by: Cheng, Penghui <[email protected]> * Fixed UT error Signed-off-by: Cheng, Penghui <[email protected]> * Update examples/neural_compressor/text-generation/run_generation.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <[email protected]> * Update import message Signed-off-by: Cheng, Penghui <[email protected]> * Limit intel-extension-for-transformers version Signed-off-by: Cheng, Penghui <[email protected]> * Limit torch version for weight-only quantization Signed-off-by: Cheng, Penghui <[email protected]> * Fixed doc building error Signed-off-by: Cheng, Penghui <[email protected]> --------- Signed-off-by: Cheng, Penghui <[email protected]> Co-authored-by: Ella Charlaix <[email protected]> Co-authored-by: Alexander Kozlov <[email protected]> Co-authored-by: Ella Charlaix <[email protected]> Co-authored-by: Lyalyushkin Nikolay <[email protected]> Co-authored-by: Helena Kloosterman <[email protected]> Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: Ekaterina Aidova <[email protected]> Co-authored-by: Karol Blaszczak <[email protected]> Co-authored-by: Liubov Talamanova <[email protected]>

echarlaix added 4 commits March 7, 2024 10:38

Remove accelerate dependency

ef90dc0

Add accelerate to import backend mapping

103620a

Add eval method to OVModels

ebc108b

add accelerate dependency for inc extra

429e34c

echarlaix marked this pull request as draft March 7, 2024 09:49

echarlaix added 2 commits March 7, 2024 11:11

fix test

47fa545

add onnxruntime install for OV test

d837268

echarlaix marked this pull request as ready for review March 7, 2024 14:02

echarlaix changed the title ~~Remove accelerate from required dependencies~~ Remove accelerate and onnxruntime from required dependencies Mar 7, 2024

echarlaix requested review from AlexKoff88, eaidova and helena-intel March 7, 2024 14:03

unrelated

8c7a0fe

helena-intel approved these changes Mar 7, 2024

View reviewed changes

update setup

0b50ed5

fix

450a63b

fix test expected int8

94a990f

AlexKoff88 approved these changes Mar 8, 2024

View reviewed changes

fix

b12a536

echarlaix merged commit 72b0630 into main Mar 8, 2024
11 of 12 checks passed

echarlaix deleted the rm-dependecies branch March 8, 2024 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove accelerate and onnxruntime from required dependencies #590

Remove accelerate and onnxruntime from required dependencies #590

echarlaix commented Mar 7, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 7, 2024

helena-intel left a comment •

edited

Loading

echarlaix commented Mar 7, 2024

AlexKoff88 commented Mar 8, 2024

echarlaix commented Mar 8, 2024 •

edited

Loading

Remove accelerate and onnxruntime from required dependencies #590

Remove accelerate and onnxruntime from required dependencies #590

Conversation

echarlaix commented Mar 7, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Mar 7, 2024

helena-intel left a comment • edited Loading

Choose a reason for hiding this comment

echarlaix commented Mar 7, 2024

AlexKoff88 commented Mar 8, 2024

echarlaix commented Mar 8, 2024 • edited Loading

echarlaix commented Mar 7, 2024 •

edited

Loading

helena-intel left a comment •

edited

Loading

echarlaix commented Mar 8, 2024 •

edited

Loading