We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
I followed the example on LLama3 and run into below issues on GPT-J
2024-11-05T15:58:25.267390+0000 | _check_compile_recipe | INFO - Recipe compiled and 1 modifiers created Traceback (most recent call last): File "/tmp/t.py", line 66, in <module> oneshot( File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot main(model_args, data_args, training_args) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main stage_runner.one_shot() File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot self.trainer.one_shot(calibration_data=calib_data, stage=stage) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot apply( File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/session_functions.py", line 184, in apply return active_session().apply( File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/session.py", line 210, in apply self.initialize(**kwargs) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/session.py", line 156, in initialize mod_data = self._lifecycle.initialize( File "/usr/local/lib/python3.10/dist-packages/llmcompressor/core/lifecycle.py", line 126, in initialize data = mod.initialize(state=self.state, **extras) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/stage.py", line 124, in initialize modifier.initialize(state, **kwargs) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/modifier.py", line 118, in initialize initialized = self.on_initialize(state=state, **kwargs) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/smoothquant/base.py", line 127, in on_initialize self.resolved_mappings_ = self._resolve_mappings(state.model) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/modifiers/smoothquant/base.py", line 178, in _resolve_mappings to_smooth_layers = get_layers(to_smooth, model) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/utils/pytorch/module.py", line 166, in get_layers return match_layers_params(targets, module) File "/usr/local/lib/python3.10/dist-packages/llmcompressor/utils/pytorch/module.py", line 160, in match_layers_params raise ValueError(f"Could not find targets {missed} in module {module}") ValueError: Could not find targets ['re:.*input_layernorm'] in module GPTJForCausalLM( (transformer): GPTJModel( (wte): Embedding(50400, 4096) (drop): Dropout(p=0.0, inplace=False) (h): ModuleList( (0-27): 28 x GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) ) (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) ) (lm_head): Linear(in_features=4096, out_features=50400, bias=True) )
Expected behavior W8A8 quant works on GPT-J
Environment Include all relevant environment information:
f7245c8
To Reproduce following the example https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w8a8_int8/llama3_example.py
Errors If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered:
Hi @dsikka could you please kindly share some insights on this issue?
thanks, -yuan
Sorry, something went wrong.
thanks,
Hey yuan - this is likely due to the default mapping for SmoothQuant.
@rahul-tuli - can you help yuan with a recipe for GPT-J? Thanks!
hey,do you reslove gptj quant bug ?
No branches or pull requests
Describe the bug
I followed the example on LLama3 and run into below issues on GPT-J
Expected behavior
W8A8 quant works on GPT-J
Environment
Include all relevant environment information:
f7245c8
]: a173a0cTo Reproduce
following the example
https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w8a8_int8/llama3_example.py
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered: