Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update test_run_compressed #970

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

update test_run_compressed #970

wants to merge 4 commits into from

Conversation

horheynm
Copy link
Collaborator

@horheynm horheynm commented Dec 11, 2024

Contingent on merge of huggingface/transformers#34719

SUMMARY:
Update run_compressed tests from decompression tests to run_comrpressed tests -> test if run_compressed True/False models generate the same output

Add decompress tests that copies attrs from the source dir path's model to the target model.

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@dsikka
Copy link
Collaborator

dsikka commented Dec 11, 2024

The generations compared are decompressed and decompressed on the forward pass ie compressed linear. So they should be close within a tolerance

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've changed the goal of the test which is compressed linear vs decompressing the whole model. We still want run_compressed to be True for one of the models

@horheynm
Copy link
Collaborator Author

I think you've changed the goal of the test which is compressed linear vs decompressing the whole model. We still want run_compressed to be True for one of the models

Honestly, idk what the original test was doing. If you want to add compressed linear, decompressed model and its model.generate thats cool.

original test is comparing a quantized model and a non-touched base model, that will always fail

@dsikka
Copy link
Collaborator

dsikka commented Dec 11, 2024

I think you've changed the goal of the test which is compressed linear vs decompressing the whole model. We still want run_compressed to be True for one of the models

Honestly, idk what the original test was doing. If you want to add compressed linear, decompressed model and its model.generate thats cool.

original test is comparing a quantized model and a non-touched base model, that will always fail

  1. We load a compressed model:

cls.compressed_model = AutoModelForCausalLM.from_pretrained(

  1. We create an empty model to hold the decompressed weights:

cls.uncompressed_model = AutoModelForCausalLM.from_pretrained(

  1. We then use lines 42-50 to decompress the whole model, using the empty model as the skeleton. This is loading the same checkpoint as what is used by the compressed model.
    config = AutoConfig.from_pretrained(cls.model_stub)

We then run generations and compare the two cases. You're comparing decompressed in both cases, one is just decompressing on the forward pass. It's the same checkpoint in both cases, which is why this test passes.

We just want to update such that the uncompressed model can be decompressed using your new transformer changes.
The goal of the test is that compressed linear decompression is the same as the entire model being decompressed up-front.

@horheynm
Copy link
Collaborator Author

horheynm commented Dec 11, 2024

Ok i see, we should update the test name, run_compressed_configs is confusing

@dsikka
Copy link
Collaborator

dsikka commented Dec 11, 2024

Ok i see, we should update the test name, test_run_config is confusing

Yeah cuz of the flag/arg name. We can call it test_compressed_linear_decompress or something

@horheynm
Copy link
Collaborator Author

/ready

@dsikka dsikka marked this pull request as draft December 12, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants