Popular repositories Loading
-
llm-awq
llm-awq PublicForked from mit-han-lab/llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python 1
-
-
neural-compressor
neural-compressor PublicForked from intel/neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, spar…
Python
-
qlora
qlora PublicForked from artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
Jupyter Notebook
-
peft
peft PublicForked from huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Python
-
OmniQuant
OmniQuant PublicForked from OpenGVLab/OmniQuant
OmniQuant is a simple and powerful quantization technique for LLMs.
Python
Repositories
- compressa-deploy Public
compressa-ai/compressa-deploy’s past year of commit activity - compressa-ai.github.io Public
compressa-ai/compressa-ai.github.io’s past year of commit activity - compressa-perf Public
compressa-ai/compressa-perf’s past year of commit activity - vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
compressa-ai/vllm’s past year of commit activity - OmniQuant Public Forked from OpenGVLab/OmniQuant
OmniQuant is a simple and powerful quantization technique for LLMs.
compressa-ai/OmniQuant’s past year of commit activity - AutoAWQ Public Forked from casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
compressa-ai/AutoAWQ’s past year of commit activity