-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gptq tokenized dataset #1584
Gptq tokenized dataset #1584
Conversation
#1585 supersede this, right? |
Co-authored-by: fxmarty <[email protected]>
Co-authored-by: fxmarty <[email protected]>
Co-authored-by: fxmarty <[email protected]>
Co-authored-by: fxmarty <[email protected]>
Thanks for having a look @fxmarty ! No this is a functionality requested by @TheBloke. The quantization in transformers is quite slow compared to AutoGPTQ, maybe because of the dataset processing. So we allow tokenized dataset. My hunch personally is that with |
Yeah I asked for it so I could have complete control over the dataset when using Transformers to make a GPTQ. With AutoGPTQ, I have this control, because I can tokenise the dataset myself and then pass this to AutoGPTQ to use. I use this to pick context-length appropriate samples. Eg for a 4096 model, I will pass 128 x 4096 token samples. With Transformers I could never do this, I just have to pass List[str], and I wasn't sure what data exactly was being used. So I just passed 5000 x strings of various lengths. Transformers was also much slower at making GPTQs than AutoGPTQ, and I thought these facts might be connected - although based on what Marc said here, maybe that's for other reasons? Anyway, even if it's not the cause of the speed difference, it's great that I'll now be able to have full control over the dataset so I can ensure I send enough data for long context models, but not more than I need. And also now I can bulk tokenise the dataset myself, which I can do very fast. |
What does this PR do ?
This PR allow to pass tokenized dataset for gptq quantization