KV Cache Quantization example cause problem #660

weicheng59 · 2024-09-25T10:16:36Z

Describe the bug
Following readme here I cannot get a fp8 weight activation and kv cache quantization using either build from source or 0.2.0 version.

Expected behavior
No error.

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 20.04]:Ubuntu 22.04
Python version [e.g. 3.7]: 3.10.13
LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: 0.2.0
ML framework version(s) [e.g. torch 2.3.1]: 2.3.1
Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce
python3 llama3_fp8_kv_example.py

Errors

2024-09-25T17:53:34.502076+0800 | one_shot | INFO - *** One Shot ***
cannot import name 'ActivationOrdering' from 'compressed_tensors.quantization' (/home/asus/miniconda3/lib/python3.10/site-packages/compressed_tensors/quantization/init.py)
Traceback (most recent call last):
File "/tdx/tools/llmcompressor/fp8_static.py", line 76, in
oneshot(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main
stage_runner.one_shot()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
self.trainer.one_shot(calibration_data=calib_data, stage=stage)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
return active_session().apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 210, in apply
self.initialize(**kwargs)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 156, in initialize
mod_data = self._lifecycle.initialize(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 122, in initialize
self._check_compile_recipe()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 247, in check_compile_recipe
self.modifiers = self.recipe_container.compiled_recipe.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/recipe.py", line 358, in create_modifier
stage_modifiers = stage.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/stage.py", line 118, in create_modifier
modifier = modifier.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/modifier.py", line 76, in create_modifier
return ModifierFactory.create(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/modifiers/factory.py", line 122, in create
raise ValueError(f"No modifier of type '{type}' found.")
ValueError: No modifier of type 'QuantizationModifier' found.

Additional context
install from source or pip install git+https://github.com/vllm-project/llm-compressor.git@cb98f34d4ec9dd175e6995d12fb02dec39c6f27a will result

INFO: pip is looking at multiple versions of llmcompressor-dev[dev] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement compressed-tensors-nightly (from llmcompressor-dev[dev]) (from versions: none)
ERROR: No matching distribution found for compressed-tensors-nightly

The text was updated successfully, but these errors were encountered:

markurtz · 2024-10-18T01:32:11Z

Hi @weicheng59, I'm going to look into this early next week and will get back to you. This looks like it was an issue with version compatibility with compressed tensors, but I can't guarantee that until I dive in a bit more.

paulliwog · 2024-11-20T05:14:10Z

Running into this as well. This appears to be the impacting PR, where it returns a None type causing the Type Error: https://github.com/neuralmagic/compressed-tensors/pull/157/files

I'm going to try the compressed tensors version before this PR went in.

paulliwog · 2024-11-20T06:06:10Z

I was able to get the example to run completely using the following steps:

Use compressor-tensors==0.5.0
Download the hf model to my local dir before executing

However, the output model doesn't appear to be quantized. The only change in the config.json is the "_name_or_path" and there are the same size of .safetensors as previously.

This was working for me previously in early September, any recommendations on how to resolve the issues in this example @markurtz?

robertgshaw2-neuralmagic · 2024-12-06T18:54:17Z

@horheynm - would you mind taking a run through the current KV cache example and seeing if there are any issues?

weicheng59 added the bug Something isn't working label Sep 25, 2024

markurtz self-assigned this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV Cache Quantization example cause problem #660

KV Cache Quantization example cause problem #660

weicheng59 commented Sep 25, 2024

markurtz commented Oct 18, 2024

paulliwog commented Nov 20, 2024

paulliwog commented Nov 20, 2024

robertgshaw2-neuralmagic commented Dec 6, 2024

KV Cache Quantization example cause problem #660

KV Cache Quantization example cause problem #660

Comments

weicheng59 commented Sep 25, 2024

markurtz commented Oct 18, 2024

paulliwog commented Nov 20, 2024

paulliwog commented Nov 20, 2024

robertgshaw2-neuralmagic commented Dec 6, 2024