You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Following readme here I cannot get a fp8 weight activation and kv cache quantization using either build from source or 0.2.0 version.
Expected behavior
No error.
Environment
Include all relevant environment information:
OS [e.g. Ubuntu 20.04]:Ubuntu 22.04
Python version [e.g. 3.7]: 3.10.13
LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: 0.2.0
ML framework version(s) [e.g. torch 2.3.1]: 2.3.1
Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
Other relevant environment information [e.g. hardware, CUDA version]:
To Reproduce
python3 llama3_fp8_kv_example.py
Errors
2024-09-25T17:53:34.502076+0800 | one_shot | INFO - *** One Shot ***
cannot import name 'ActivationOrdering' from 'compressed_tensors.quantization' (/home/asus/miniconda3/lib/python3.10/site-packages/compressed_tensors/quantization/init.py)
Traceback (most recent call last):
File "/tdx/tools/llmcompressor/fp8_static.py", line 76, in
oneshot(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main
stage_runner.one_shot()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
self.trainer.one_shot(calibration_data=calib_data, stage=stage)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
return active_session().apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 210, in apply
self.initialize(**kwargs)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 156, in initialize
mod_data = self._lifecycle.initialize(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 122, in initialize
self._check_compile_recipe()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 247, in check_compile_recipe
self.modifiers = self.recipe_container.compiled_recipe.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/recipe.py", line 358, in create_modifier
stage_modifiers = stage.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/stage.py", line 118, in create_modifier
modifier = modifier.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/modifier.py", line 76, in create_modifier
return ModifierFactory.create(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/modifiers/factory.py", line 122, in create
raise ValueError(f"No modifier of type '{type}' found.")
ValueError: No modifier of type 'QuantizationModifier' found.
Additional context
install from source or pip install git+https://github.com/vllm-project/llm-compressor.git@cb98f34d4ec9dd175e6995d12fb02dec39c6f27a will result
INFO: pip is looking at multiple versions of llmcompressor-dev[dev] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement compressed-tensors-nightly (from llmcompressor-dev[dev]) (from versions: none)
ERROR: No matching distribution found for compressed-tensors-nightly
The text was updated successfully, but these errors were encountered:
Hi @weicheng59, I'm going to look into this early next week and will get back to you. This looks like it was an issue with version compatibility with compressed tensors, but I can't guarantee that until I dive in a bit more.
I was able to get the example to run completely using the following steps:
Use compressor-tensors==0.5.0
Download the hf model to my local dir before executing
However, the output model doesn't appear to be quantized. The only change in the config.json is the "_name_or_path" and there are the same size of .safetensors as previously.
This was working for me previously in early September, any recommendations on how to resolve the issues in this example @markurtz?
Describe the bug
Following readme here I cannot get a fp8 weight activation and kv cache quantization using either build from source or 0.2.0 version.
Expected behavior
No error.
Environment
Include all relevant environment information:
f7245c8
]: 0.2.0To Reproduce
python3 llama3_fp8_kv_example.py
Errors
2024-09-25T17:53:34.502076+0800 | one_shot | INFO - *** One Shot ***
cannot import name 'ActivationOrdering' from 'compressed_tensors.quantization' (/home/asus/miniconda3/lib/python3.10/site-packages/compressed_tensors/quantization/init.py)
Traceback (most recent call last):
File "/tdx/tools/llmcompressor/fp8_static.py", line 76, in
oneshot(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main
stage_runner.one_shot()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
self.trainer.one_shot(calibration_data=calib_data, stage=stage)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
return active_session().apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 210, in apply
self.initialize(**kwargs)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 156, in initialize
mod_data = self._lifecycle.initialize(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 122, in initialize
self._check_compile_recipe()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 247, in check_compile_recipe
self.modifiers = self.recipe_container.compiled_recipe.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/recipe.py", line 358, in create_modifier
stage_modifiers = stage.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/stage.py", line 118, in create_modifier
modifier = modifier.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/modifier.py", line 76, in create_modifier
return ModifierFactory.create(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/modifiers/factory.py", line 122, in create
raise ValueError(f"No modifier of type '{type}' found.")
ValueError: No modifier of type 'QuantizationModifier' found.
Additional context
install from source or
pip install git+https://github.com/vllm-project/llm-compressor.git@cb98f34d4ec9dd175e6995d12fb02dec39c6f27a
will resultINFO: pip is looking at multiple versions of llmcompressor-dev[dev] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement compressed-tensors-nightly (from llmcompressor-dev[dev]) (from versions: none)
ERROR: No matching distribution found for compressed-tensors-nightly
The text was updated successfully, but these errors were encountered: