Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples: Add text compression example. #9633

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Sep 24, 2024

This PR adds an example text compression scheme using a language model. This compression scheme is not optimal, but it's not too far from it.

Performance:

Testing it on the source file against classical compression schemes.

Size (bytes) Name
1480 compress.cpp.qwen2.5-coder-1.5b-q6-k.bin
1487 compress.cpp.llama3-8b-q4-k-m.bin
1557 compress.cpp.starcoder2-3b-q8.bin
3872 compress.cpp.gz
3878 compress.cpp.bz2
3908 compress.cpp.xz
3983 compress.cpp.7z
3999 compress.cpp.zip

Usage:

Compression

./compress --mode compress -m path/to/your/model.gguf -f path/to/the/text/file.txt -o output.bin

Decompression

./compress --mode expand -m path/to/your/model.gguf -f output.bin -o output.txt

Drawbacks

It's very slow compared to traditionnal compression schemes.
It needs the exact same setup for compression and decompression. ( just changing the number of offloaded gpu layers can change the behavior enough to introduce errors )

How it works

TODO (I'm bad at explainig things, but please read the code)

@ngxson
Copy link
Collaborator

ngxson commented Sep 25, 2024

Is this the same method as: https://arxiv.org/pdf/2306.04050 ?

@stduhpf
Copy link
Contributor Author

stduhpf commented Sep 25, 2024

Is this the same method as: https://arxiv.org/pdf/2306.04050 ?

Interesting, thanks for sharing. At first glance, this does look similar to what I'm doing. At least the part about the ranks is the same.
The main difference is in the compression format. I'm using a bespoke algorithm here, but maybe Arithmetic Coding (like in the paper) would be better.
Actually, this would be pretty much equivalent to arithmetic coding if we assume the token probabilities are decreasing exponentially with rank (wich is not the case in reality, making this less efficient).

@matteoserva
Copy link
Contributor

Just for reference, I found an interesting implementation of arithmetic coding using llama_cpp_python:

https://github.com/AlexBuz/llama-zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants