Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W8A8 quant for GPT-J failed #909

Open
zhouyuan opened this issue Nov 12, 2024 · 3 comments
Open

W8A8 quant for GPT-J failed #909

zhouyuan opened this issue Nov 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@zhouyuan
Copy link

Describe the bug

I followed the example on LLama3 and run into below issues on GPT-J

2024-11-05T15:58:25.267390+0000 | _check_compile_recipe | INFO - Recipe compiled and 1 modifiers created
Traceback (most recent call last):
  File "/tmp/t.py", line 66, in <module>
    oneshot(
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
    main(model_args, data_args, training_args)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main
    stage_runner.one_shot()
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
    self.trainer.one_shot(calibration_data=calib_data, stage=stage)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
    apply(
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/session_functions.py", line 184, in apply
    return active_session().apply(
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/session.py", line 210, in apply
    self.initialize(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/session.py", line 156, in initialize
    mod_data = self._lifecycle.initialize(
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/lifecycle.py", line 126, in initialize
    data = mod.initialize(state=self.state, **extras)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/stage.py", line 124, in initialize
    modifier.initialize(state, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/modifier.py", line 118, in initialize
    initialized = self.on_initialize(state=state, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/smoothquant/base.py", line 127, in on_initialize
    self.resolved_mappings_ = self._resolve_mappings(state.model)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/smoothquant/base.py", line 178, in _resolve_mappings
    to_smooth_layers = get_layers(to_smooth, model)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/utils/pytorch/module.py", line 166, in get_layers
    return match_layers_params(targets, module)
  File "/usr/local/lib/python3.10/dist-packages/llmcompressor/utils/pytorch/module.py", line 160, in match_layers_params
    raise ValueError(f"Could not find targets {missed} in module {module}")
ValueError: Could not find targets ['re:.*input_layernorm'] in module GPTJForCausalLM(
  (transformer): GPTJModel(
    (wte): Embedding(50400, 4096)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-27): 28 x GPTJBlock(
        (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attn): GPTJAttention(
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (out_proj): Linear(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): GPTJMLP(
          (fc_in): Linear(in_features=4096, out_features=16384, bias=True)
          (fc_out): Linear(in_features=16384, out_features=4096, bias=True)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=4096, out_features=50400, bias=True)
)

Expected behavior
W8A8 quant works on GPT-J

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 20.04]: 22.04
  2. Python version [e.g. 3.7]: 3.10
  3. LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: a173a0c
  4. ML framework version(s) [e.g. torch 2.3.1]: torch 2.5.1
  5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]: 12.1

To Reproduce
following the example
https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w8a8_int8/llama3_example.py

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

@zhouyuan zhouyuan added the bug Something isn't working label Nov 12, 2024
@zhouyuan
Copy link
Author

Hi @dsikka could you please kindly share some insights on this issue?

thanks,
-yuan

@robertgshaw2-neuralmagic
Copy link
Collaborator

thanks,

Hey yuan - this is likely due to the default mapping for SmoothQuant.

@rahul-tuli - can you help yuan with a recipe for GPT-J? Thanks!

@yemyhdtrc6088
Copy link

hey,do you reslove gptj quant bug ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants