Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gptq tokenized dataset #1584

Merged
merged 8 commits into from
Dec 13, 2023
Merged

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Dec 11, 2023

What does this PR do ?

This PR allow to pass tokenized dataset for gptq quantization

optimum/gptq/quantizer.py Outdated Show resolved Hide resolved
optimum/gptq/quantizer.py Outdated Show resolved Hide resolved
optimum/gptq/quantizer.py Outdated Show resolved Hide resolved
optimum/gptq/quantizer.py Outdated Show resolved Hide resolved
optimum/gptq/quantizer.py Outdated Show resolved Hide resolved
@fxmarty
Copy link
Contributor

fxmarty commented Dec 12, 2023

#1585 supersede this, right?

@SunMarc
Copy link
Member Author

SunMarc commented Dec 12, 2023

Thanks for having a look @fxmarty ! No this is a functionality requested by @TheBloke. The quantization in transformers is quite slow compared to AutoGPTQ, maybe because of the dataset processing. So we allow tokenized dataset. My hunch personally is that with modules_in_block_to_quantize, we should have the same speed now. This is because we are quantizing one layer at a time when we don't set modules_in_block_to_quantize

@TheBloke
Copy link

Yeah I asked for it so I could have complete control over the dataset when using Transformers to make a GPTQ.

With AutoGPTQ, I have this control, because I can tokenise the dataset myself and then pass this to AutoGPTQ to use.

I use this to pick context-length appropriate samples. Eg for a 4096 model, I will pass 128 x 4096 token samples.

With Transformers I could never do this, I just have to pass List[str], and I wasn't sure what data exactly was being used. So I just passed 5000 x strings of various lengths.

Transformers was also much slower at making GPTQs than AutoGPTQ, and I thought these facts might be connected - although based on what Marc said here, maybe that's for other reasons?

Anyway, even if it's not the cause of the speed difference, it's great that I'll now be able to have full control over the dataset so I can ensure I send enough data for long context models, but not more than I need. And also now I can bulk tokenise the dataset myself, which I can do very fast.

@fxmarty fxmarty merged commit afe2e3c into huggingface:main Dec 13, 2023
40 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants