Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV Cache Quantization example cause problem #660

Open
weicheng59 opened this issue Sep 25, 2024 · 4 comments
Open

KV Cache Quantization example cause problem #660

weicheng59 opened this issue Sep 25, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@weicheng59
Copy link

Describe the bug
Following readme here I cannot get a fp8 weight activation and kv cache quantization using either build from source or 0.2.0 version.

Expected behavior
No error.

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 20.04]:Ubuntu 22.04
  2. Python version [e.g. 3.7]: 3.10.13
  3. LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: 0.2.0
  4. ML framework version(s) [e.g. torch 2.3.1]: 2.3.1
  5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce
python3 llama3_fp8_kv_example.py

Errors

2024-09-25T17:53:34.502076+0800 | one_shot | INFO - *** One Shot ***
cannot import name 'ActivationOrdering' from 'compressed_tensors.quantization' (/home/asus/miniconda3/lib/python3.10/site-packages/compressed_tensors/quantization/init.py)
Traceback (most recent call last):
File "/tdx/tools/llmcompressor/fp8_static.py", line 76, in
oneshot(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main
stage_runner.one_shot()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
self.trainer.one_shot(calibration_data=calib_data, stage=stage)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
return active_session().apply(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 210, in apply
self.initialize(**kwargs)
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/session.py", line 156, in initialize
mod_data = self._lifecycle.initialize(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 122, in initialize
self._check_compile_recipe()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 247, in check_compile_recipe
self.modifiers = self.recipe_container.compiled_recipe.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/recipe.py", line 358, in create_modifier
stage_modifiers = stage.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/stage.py", line 118, in create_modifier
modifier = modifier.create_modifier()
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/recipe/modifier.py", line 76, in create_modifier
return ModifierFactory.create(
File "/home/asus/miniconda3/lib/python3.10/site-packages/llmcompressor/modifiers/factory.py", line 122, in create
raise ValueError(f"No modifier of type '{type
}' found.")
ValueError: No modifier of type 'QuantizationModifier' found.

Additional context
install from source or pip install git+https://github.com/vllm-project/llm-compressor.git@cb98f34d4ec9dd175e6995d12fb02dec39c6f27a will result

INFO: pip is looking at multiple versions of llmcompressor-dev[dev] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement compressed-tensors-nightly (from llmcompressor-dev[dev]) (from versions: none)
ERROR: No matching distribution found for compressed-tensors-nightly

@weicheng59 weicheng59 added the bug Something isn't working label Sep 25, 2024
@markurtz markurtz self-assigned this Oct 18, 2024
@markurtz
Copy link
Collaborator

Hi @weicheng59, I'm going to look into this early next week and will get back to you. This looks like it was an issue with version compatibility with compressed tensors, but I can't guarantee that until I dive in a bit more.

@paulliwog
Copy link

Running into this as well. This appears to be the impacting PR, where it returns a None type causing the Type Error: https://github.com/neuralmagic/compressed-tensors/pull/157/files

I'm going to try the compressed tensors version before this PR went in.

@paulliwog
Copy link

I was able to get the example to run completely using the following steps:

  • Use compressor-tensors==0.5.0
  • Download the hf model to my local dir before executing

However, the output model doesn't appear to be quantized. The only change in the config.json is the "_name_or_path" and there are the same size of .safetensors as previously.

This was working for me previously in early September, any recommendations on how to resolve the issues in this example @markurtz?

@robertgshaw2-neuralmagic
Copy link
Collaborator

@horheynm - would you mind taking a run through the current KV cache example and seeing if there are any issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants