Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to compress LLM models using nvcomp #216

Open
Iyzyman opened this issue Sep 2, 2024 · 2 comments
Open

Trying to compress LLM models using nvcomp #216

Iyzyman opened this issue Sep 2, 2024 · 2 comments
Labels

Comments

@Iyzyman
Copy link

Iyzyman commented Sep 2, 2024

Hi all, I've been trying to compress Large language models using nvcomp but can't succeed. I only managed to compress the tokenizer.json and config.json files of the model but was unable to compress the .safetensors or .gguf model files.

Does nvcomp currently support this? Can I ask how can i do so?

Much appreciated

@JanuszL JanuszL added the nvCOMP label Sep 2, 2024
@akshaysubr
Copy link

@Iyzyman Thanks for the question. Can you share a bit more on which LLM model are you trying to compress and for what use case? Is it mainly to reduce the checkpoint size on disk or are you looking to do compression in memory or something else?

@Iyzyman
Copy link
Author

Iyzyman commented Sep 7, 2024

@Iyzyman Thanks for the question. Can you share a bit more on which LLM model are you trying to compress and for what use case? Is it mainly to reduce the checkpoint size on disk or are you looking to do compression in memory or something else?

@akshaysubr Thanks for the response. I was trying to compress Meta-Llama-3-8B-Instruct and it's quantized versions. Mainly to reduce size on the disk. Would that be possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants