-
Notifications
You must be signed in to change notification settings - Fork 66
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of github.com:vllm-project/llm-compressor into kv…
…-cache
- Loading branch information
Showing
160 changed files
with
3,707 additions
and
1,524 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,151 +1,127 @@ | ||
name: Test Checks | ||
on: | ||
on: | ||
push: | ||
branches: | ||
- main | ||
- 'release/*' | ||
pull_request: | ||
branches: | ||
- main | ||
- 'release/*' | ||
types: [opened, synchronize] | ||
|
||
env: | ||
CADENCE: "commit" | ||
CLEARML_WEB_HOST: ${{ secrets.CLEARML_WEB_HOST }} | ||
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }} | ||
CLEARML_API_ACCESS_KEY: ${{ secrets.CLEARML_API_ACCESS_KEY }} | ||
CLEARML_FILES_HOST: ${{ secrets.CLEARML_FILES_HOST }} | ||
CLEARML_API_SECRET_KEY: ${{ secrets.CLEARML_API_SECRET_KEY }} | ||
CLEARML_FILES_HOST: ${{ secrets.CLEARML_FILES_HOST }} | ||
CLEARML_API_SECRET_KEY: ${{ secrets.CLEARML_API_SECRET_KEY }} | ||
|
||
jobs: | ||
test-setup: | ||
runs-on: ubuntu-22.04 | ||
outputs: | ||
branch: ${{ steps.get-branch.outputs.branch }} | ||
base: ${{ steps.base-check.outputs.output }} | ||
pytorch: ${{ steps.pytorch-check.outputs.output }} | ||
transformers: ${{ steps.transformers-check.outputs.output }} | ||
steps: | ||
- uses: actions/checkout@v2 | ||
with: | ||
fetch-depth: 0 | ||
# TODO: for @DanH what is this supposed to be doing? | ||
# The way it was being used before was only testing code on main, | ||
# not on the current PR. git branch --show current does not work | ||
- name: Get current branch | ||
id: get-branch | ||
run: > | ||
(git branch --show-current | grep -E "release/") | ||
&& echo "::set-output name=branch::$(git branch --show-current)" | ||
|| echo "::set-output name=branch::main" | ||
|
||
base-tests: | ||
runs-on: ubuntu-22.04 | ||
needs: test-setup | ||
steps: | ||
- uses: actions/setup-python@v4 | ||
- uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' | ||
- uses: actions/checkout@v2 | ||
- uses: actions/checkout@v2 | ||
python-version: '3.12' | ||
- uses: actions/checkout@v4 | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install -U pip setuptools && pip3 install .[dev] | ||
- uses: actions/checkout@v4 | ||
with: | ||
repository: "neuralmagic/compressed-tensors" | ||
path: "compressed-tensors" | ||
ref: ${{needs.test-setup.outputs.branch}} | ||
- name: "⚙️ Install compressed-tensors dependencies" | ||
run: pip3 install -U pip && pip3 install setuptools compressed-tensors/ | ||
run: | | ||
pip3 uninstall -y compressed-tensors compressed-tensors-nightly | ||
pip3 install ./compressed-tensors/ | ||
- name: "Clean compressed-tensors directory" | ||
run: rm -r compressed-tensors/ | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install .[dev] | ||
- name: "🔬 Running base tests" | ||
run: make test | ||
|
||
pytorch-tests: | ||
runs-on: ubuntu-22.04 | ||
needs: test-setup | ||
steps: | ||
- uses: actions/setup-python@v4 | ||
- uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' | ||
- uses: actions/checkout@v2 | ||
- uses: actions/checkout@v2 | ||
- uses: actions/checkout@v4 | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install -U pip setuptools && pip3 install .[dev] | ||
- uses: actions/checkout@v4 | ||
with: | ||
repository: "neuralmagic/compressed-tensors" | ||
path: "compressed-tensors" | ||
ref: ${{needs.test-setup.outputs.branch}} | ||
- name: "⚙️ Install compressed-tensors dependencies" | ||
run: pip3 install -U pip && pip3 install setuptools compressed-tensors/ | ||
run: | | ||
pip3 uninstall -y compressed-tensors compressed-tensors-nightly | ||
pip3 install ./compressed-tensors/ | ||
- name: "Clean compressed-tensors directory" | ||
run: rm -r compressed-tensors/ | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install .[dev] | ||
- name: "🔬 Running pytorch tests" | ||
run: | | ||
pytest tests/llmcompressor/pytorch -v | ||
pytest -v tests/llmcompressor/pytorch | ||
compat-pytorch-1_9-pytorch-tests: | ||
runs-on: ubuntu-22.04 | ||
needs: test-setup | ||
steps: | ||
- uses: actions/setup-python@v4 | ||
- uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.9' | ||
- uses: actions/checkout@v2 | ||
- uses: actions/checkout@v2 | ||
python-version: '3.10' | ||
- uses: actions/checkout@v4 | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install -U pip setuptools && pip3 install .[dev] | ||
- uses: actions/checkout@v4 | ||
with: | ||
repository: "neuralmagic/compressed-tensors" | ||
path: "compressed-tensors" | ||
ref: ${{needs.test-setup.outputs.branch}} | ||
- name: "⚙️ Install compressed-tensors dependencies" | ||
run: pip3 install -U pip && pip3 install setuptools compressed-tensors/ | ||
run: | | ||
pip3 uninstall -y compressed-tensors compressed-tensors-nightly | ||
pip3 install ./compressed-tensors/ | ||
- name: "Clean compressed-tensors directory" | ||
run: rm -r compressed-tensors/ | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install .[dev] | ||
- name: "🔬 Running pytorch tests" | ||
run: | | ||
pytest tests/llmcompressor/pytorch -v | ||
pytest -v tests/llmcompressor/pytorch | ||
transformers-tests: | ||
runs-on: ubuntu-22.04 | ||
needs: test-setup | ||
runs-on: gcp-k8s-vllm-l4-solo | ||
steps: | ||
- uses: actions/setup-python@v4 | ||
- uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' | ||
- uses: actions/checkout@v2 | ||
- uses: actions/checkout@v2 | ||
python-version: '3.9' | ||
- uses: actions/checkout@v4 | ||
- name: "⚙️ Install dependencies" | ||
run: pip3 install -U pip setuptools && pip3 install .[dev] | ||
- uses: actions/checkout@v4 | ||
with: | ||
repository: "neuralmagic/compressed-tensors" | ||
path: "compressed-tensors" | ||
ref: ${{needs.test-setup.outputs.branch}} | ||
- name: "⚙️ Install compressed-tensors dependencies" | ||
run: pip3 install -U pip && pip3 install setuptools compressed-tensors/ | ||
id: install | ||
run: | | ||
pip3 uninstall -y compressed-tensors compressed-tensors-nightly | ||
pip3 install ./compressed-tensors/ | ||
- name: "Clean compressed-tensors directory" | ||
run: rm -r compressed-tensors/ | ||
- name: "⚙️ Install dependencies" | ||
id: install | ||
run: pip3 install .[dev] | ||
- name: "🔬 Running transformers tests" | ||
if: always() && steps.install.outcome == 'success' | ||
if: (success() || failure()) && steps.install.outcome == 'success' | ||
run: | | ||
pytest tests/llmcompressor/transformers/compression -v | ||
pytest -v tests/llmcompressor/transformers/compression | ||
- name: Run Finetune Tests | ||
if: always() && steps.install.outcome == 'success' | ||
if: (success() || failure()) && steps.install.outcome == 'success' | ||
run: | | ||
pytest -v tests/llmcompressor/transformers/finetune -m unit | ||
pytest -v tests/llmcompressor/transformers/finetune | ||
- name: Running GPTQ Tests | ||
if: always() && steps.install.outcome == 'success' | ||
if: (success() || failure()) && steps.install.outcome == 'success' | ||
run: | | ||
pytest tests/llmcompressor/transformers/gptq -v | ||
pytest -v tests/llmcompressor/transformers/gptq | ||
- name: Running ONESHOT Tests | ||
if: always() && steps.install.outcome == 'success' | ||
if: (success() || failure()) && steps.install.outcome == 'success' | ||
run: | | ||
pytest tests/llmcompressor/transformers/oneshot -v | ||
pytest -v tests/llmcompressor/transformers/oneshot | ||
- name: Running Sparsification Tests | ||
if: always() && steps.install.outcome == 'success' | ||
if: (success() || failure()) && steps.install.outcome == 'success' | ||
run: | | ||
pytest tests/llmcompressor/transformers/sparsification -v | ||
ptyest tests/llmcompressor/transformers/test_clear_ml.py -v | ||
pytest tests/llmcompressor/transformers/test_clear_ml.py -v | ||
- name: Running OBCQ Tests | ||
if: always() && steps.install.outcome == 'success' | ||
if: (success() || failure()) && steps.install.outcome == 'success' | ||
run: | | ||
pytest -v tests/llmcompressor/transformers/obcq -v | ||
pytest -v tests/llmcompressor/transformers/obcq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Loading models using `AutoModelForCausalLM` | ||
|
||
Models quantized through `llm-compressor` can be loaded directly through | ||
`AutoModelForCausalLM`. Note: this requires `transformers>=v4.45.0` and | ||
`compressed-tensors>v0.6.0`. | ||
|
||
```python | ||
from transformers import AutoModelForCausalLM | ||
|
||
MODEL_ID = "nm-testing/tinyllama-w8a8-compressed-hf-quantizer" | ||
|
||
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto") | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
from transformers import AutoModelForCausalLM, AutoTokenizer | ||
|
||
MODEL_ID = "nm-testing/tinyllama-w8a8-compressed-hf-quantizer" | ||
|
||
# Use the AutoModelForCausalLM to run the model | ||
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto") | ||
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) | ||
|
||
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids | ||
output = model.generate(input_ids, max_new_tokens=100) | ||
print(tokenizer.decode(output[0])) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.