vllm-project / llm-compressor Public

Notifications You must be signed in to change notification settings
Fork 66
Star 791

Code
Issues 24
Pull requests 33
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Issues: vllm-project/llm-compressor

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24 Open 90 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Wandb logging cannot be disabled bug

Something isn't working

#976 opened Dec 13, 2024 by rmakarovv

How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8 enhancement

New feature or request

#968 opened Dec 11, 2024 by moonlightian

[Bug]: When I use llmcompressor to quantify the llama3 70b model to int8-a8w8,it shows ValueError: Failed to invert hessian due to numerical instability. bug

Something isn't working

#966 opened Dec 10, 2024 by rexmxw02

The new version 0.3.0 takes a long time for quantization and eventually fails due to OOM bug

Something isn't working

#965 opened Dec 10, 2024 by okwinds

Error when quantizing LLama 3.3 70b to FP8 bug

Something isn't working

#963 opened Dec 6, 2024 by Syst3m1cAn0maly

Can I load the stage_quantization model using SparseAutoModelForCausalLM? bug

Something isn't working

#962 opened Dec 6, 2024 by jiangjiadi

Gptj use gptq quant bug bug

Something isn't working

#961 opened Dec 6, 2024 by yemyhdtrc6088

How to recover stage quantization from finetuning stage after an error bug

Something isn't working

#957 opened Dec 5, 2024 by jiangjiadi

About lora finetuning of 2:4 sparse and sparse quant models enhancement

New feature or request

#952 opened Dec 4, 2024 by arunpatala

Qwen2VL FP8_DYNAMIC Failed bug

Something isn't working

#951 opened Dec 4, 2024 by LugerW-A

quantization + sparsification - model outputs zeros bug

Something isn't working

#942 opened Nov 28, 2024 by nirey10

Several wandb init bug

Something isn't working

#934 opened Nov 26, 2024 by fzyzcjy

Got Error when I load a 2of4 model using vllm. bug

Something isn't working

#926 opened Nov 19, 2024 by jiangjiadi

Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs bug

Something isn't working

#911 opened Nov 13, 2024 by arunpatala

W8A8 quant for GPT-J failed bug

Something isn't working

#909 opened Nov 12, 2024 by zhouyuan

[Usage] How to manually set calibration_function?

#886 opened Nov 1, 2024 by donpromax

Any plan about W4A8 enhancement

New feature or request

#873 opened Oct 29, 2024 by Arcmoon-Hu

Is it possible to quantize to FP8 W8A16 without calibration data enhancement

New feature or request

#858 opened Oct 21, 2024 by us58

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue bug

Something isn't working

#853 opened Oct 19, 2024 by HengJayWang

SmoothQuant doesn't respect ignored modules for VLMs bug

Something isn't working

#687 opened Sep 26, 2024 by mgoin

KV Cache Quantization example cause problem bug

Something isn't working

#660 opened Sep 25, 2024 by weicheng59

[USAGE] FP8 W8A8 (+KV) with LORA Adapters enhancement

New feature or request

#164 opened Sep 11, 2024 by paulliwog

Yaml parsing fails with a custom mapping provided to SmoothQuantModifier recipe bug

Something isn't working

#105 opened Aug 22, 2024 by aatkinson

Layers not skipped with ignore=[ "re:.*"] bug

Something isn't working

#91 opened Aug 15, 2024 by horheynm

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly