Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AutoModel can't load gptq model due to module prefix mismatch vs AutoModelForCausalLM #2146

Merged
merged 3 commits into from
Jan 6, 2025

Conversation

LRL-ModelCloud
Copy link
Contributor

@LRL-ModelCloud LRL-ModelCloud commented Jan 2, 2025

What does this PR do?

This PR fixes the issue encountered when using AutoModel to load the GPTQ model, which caused this error:

Traceback (most recent call last):
  File "/root/GPTQModel/test_inf.py", line 5, in <module>
    model = AutoModel.from_pretrained(model_id, revision="main")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/transformers/src/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/transformers/src/transformers/modeling_utils.py", line 4090, in from_pretrained
    hf_quantizer.preprocess_model(
  File "/root/transformers/src/transformers/quantizers/base.py", line 194, in preprocess_model
    return self._process_model_before_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/transformers/src/transformers/quantizers/quantizer_gptq.py", line 84, in _process_model_before_weight_loading
    model = self.optimum_quantizer.convert_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum/optimum/gptq/quantizer.py", line 292, in convert_model
    self.block_name_to_quantize = get_block_name_with_pattern(model)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/optimum/optimum/gptq/utils.py", line 79, in get_block_name_with_pattern
    raise ValueError("Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`")
ValueError: Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`

The reason for this error is models loaded by AutoModel have different block prefixes than models loaded by AutoModelForCausalLM. For example, in the Llama model, the modules after loading with AutoModel are 'layers.0.self_attn.q_proj', 'layers.0.self_attn.k_proj', 'layers.0.self_attn.v_proj', etc. In the AutoModelForCausalLM, the modules after loading are 'model.layers.0.self_attn.q_proj', 'model.layers.0.self_attn.k_proj', 'model.layers.0.self_attn.v_proj', etc. They have different prefixes, but they correspond to the same module.

Who can review?

@Qubitium @SunMarc

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM !

@LRL-ModelCloud LRL-ModelCloud marked this pull request as ready for review January 3, 2025 01:17
@LRL-ModelCloud LRL-ModelCloud changed the title Fix the issue of AutoModel failing to load the gptq model. Fix AutoModel can't load gptq model due to module prefix mismatch vs AutoModelForCausalLM Jan 3, 2025
@Qubitium
Copy link
Contributor

Qubitium commented Jan 3, 2025

@SunMarc We found this bug while submitting a test 1B quantized model to HF OpenLLM leaderbord

  1. Can you notify the maintainer for the OpenLLM that the are are likely bugs in the test runners. the 1B gptq model is in queue for over 23 hours. It should have failed.

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/add

model: https://huggingface.co/ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1

Screenshot_3-1-2025_95745_huggingface co

  1. Are the tests performed on GPU or CPU?
  2. Will runner auto install gptqmodel or autogptq (since gptqmodel transformer has not yet been merged)

We are trying to have it test for our vortex high recovery gptq models but I don't believe existing runner will work with gptq models even if this PR is merged since it is most likely lacking autogptq (and future gptqmodel) pkgs.

@SunMarc
Copy link
Member

SunMarc commented Jan 3, 2025

The tests are performed on a h100 gpu and normally, if nothing changed yet, it should install autogptq. In the previous leaderboard, lots of gptq models were evaluated.

cc @alozowski do you know what is happening this is model ?

@alozowski
Copy link

A HF Open LLM Leaderbord maintainer here! Sorry for my late reply, indeed, our evaluation queue got stuck, but we fixed it this morning

I can confirm that we have a request file for ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1, but it failed under the automatic evaluation unfortunately. Let me try running a manual evaluation to see how it goes

Also, feel free to open a discussion about this model in our Community section so we can discuss the model evaluation there

@IlyasMoutawwakil
Copy link
Member

LGTM thanks for the fix ! will wait for GPTQ tests to pass.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil
Copy link
Member

Failures are irrelevant to the PR.

@IlyasMoutawwakil IlyasMoutawwakil merged commit 40a518b into huggingface:main Jan 6, 2025
39 of 48 checks passed
@Qubitium Qubitium deleted the fix-gptq-constant branch January 6, 2025 13:58
@Qubitium
Copy link
Contributor

Qubitium commented Jan 7, 2025

I can confirm that we have a request file for ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1, but it failed under the automatic evaluation unfortunately. Let me try running a manual evaluation to see how it goes

@alozowski Thanks for the update. Can you confirm the failure is caused by the bug that his PR fixed? If error is unrelated to this bug fix, I will move discussion to the leaderboard community board.

@Qubitium
Copy link
Contributor

@alozowski We need update on this. Please respond:

  • Is the ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1 model failing because of this bug? We are not privy to the actual pipeline that the openllm leaderboard2 runs and only following the official doc which explicitly says to use AutoModel() api to load.

There are 0 models on the leaderboard that I can see that are gptq based. There are 2 gguf and 2 awq based on rough search so there is something with the gptq pipeline code.

We are willing to fix this end everything related to gptq testing for leaderboard, if related to HF code, if we can get some debug feedback.

@alozowski
Copy link

Hi @Qubitium!
No, the error on the Leaderboard was unrelated to this bug. Unfortunatelly, it was caused by incorrectly installed dependencies. I manually evaluated ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1, and the evaluation was successful, so the results are now available on the Leaderboard. Additionally, I'm going to incorporate my changes into the Leaderboard auto-evaluation system, so all users will be able to submit their gptq models seamlessly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants