Trying to compress LLM models using nvcomp #216

Iyzyman · 2024-09-02T09:21:05Z

Hi all, I've been trying to compress Large language models using nvcomp but can't succeed. I only managed to compress the tokenizer.json and config.json files of the model but was unable to compress the .safetensors or .gguf model files.

Does nvcomp currently support this? Can I ask how can i do so?

Much appreciated

akshaysubr · 2024-09-05T18:07:55Z

@Iyzyman Thanks for the question. Can you share a bit more on which LLM model are you trying to compress and for what use case? Is it mainly to reduce the checkpoint size on disk or are you looking to do compression in memory or something else?

Iyzyman · 2024-09-07T14:55:22Z

@Iyzyman Thanks for the question. Can you share a bit more on which LLM model are you trying to compress and for what use case? Is it mainly to reduce the checkpoint size on disk or are you looking to do compression in memory or something else?

@akshaysubr Thanks for the response. I was trying to compress Meta-Llama-3-8B-Instruct and it's quantized versions. Mainly to reduce size on the disk. Would that be possible?

JanuszL added the nvCOMP label Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to compress LLM models using nvcomp #216

Trying to compress LLM models using nvcomp #216

Iyzyman commented Sep 2, 2024

akshaysubr commented Sep 5, 2024

Iyzyman commented Sep 7, 2024 •

edited

Loading

Trying to compress LLM models using nvcomp #216

Trying to compress LLM models using nvcomp #216

Comments

Iyzyman commented Sep 2, 2024

akshaysubr commented Sep 5, 2024

Iyzyman commented Sep 7, 2024 • edited Loading

Iyzyman commented Sep 7, 2024 •

edited

Loading