Skip to content

Commit

Permalink
Merge branch 'main' into nomicbert
Browse files Browse the repository at this point in the history
  • Loading branch information
bhavika authored Jul 17, 2024
2 parents 6252618 + d9bd7c3 commit ac533ca
Show file tree
Hide file tree
Showing 45 changed files with 1,057 additions and 624 deletions.
56 changes: 23 additions & 33 deletions .github/workflows/check_code_quality.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,11 @@
name: check_code_quality
name: Code Quality

on:
push:
branches: [ main ]
paths:
- "optimum/**.py"
- "tests/**.py"
- "examples/**.py"
branches: [main]

pull_request:
branches: [ main ]
paths:
- "optimum/**.py"
- "tests/**.py"
- "examples/**.py"
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand All @@ -29,25 +21,23 @@ jobs:

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Create and start a virtual environment
run: |
python -m venv venv
source venv/bin/activate
- name: Install dependencies
run: |
source venv/bin/activate
pip install --upgrade pip
pip install .[quality]
- name: Check style with black
run: |
source venv/bin/activate
black --check .
- name: Check style with ruff
run: |
source venv/bin/activate
ruff .
- name: Checkout code
uses: actions/checkout@v4

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
pip install --upgrade pip
pip install .[quality]
- name: Check style with black
run: |
black --check .
- name: Check style with ruff
run: |
ruff .
59 changes: 38 additions & 21 deletions .github/workflows/test_gptq.yml
Original file line number Diff line number Diff line change
@@ -1,29 +1,46 @@
name: GPTQ Quantization / Test GPU
name: GPTQ / Python - Test

on:
workflow_dispatch:
schedule:
- cron: 0 1 */3 * * # at 1am every 3 days
push:
branches: [main]
paths:
- tests/gptq/**
- optimum/gptq/**
- .github/workflows/test_gptq.yml
pull_request:
types: [opened, synchronize, reopened, labeled]
# uncomment to enable on PR merge on main branch:
#push:
# branches:
# - main
branches: [main]
paths:
- tests/gptq/**
- optimum/gptq/**
- .github/workflows/test_gptq.yml
schedule:
# every day at midnight
- cron: "0 0 * * *"

jobs:
do-the-job:
if: ${{ (github.event_name == 'workflow_dispatch') || (github.event_name == 'schedule') || contains( github.event.pull_request.labels.*.name, 'gpu-test') }}
name: Start self-hosted EC2 runner
test_gptq:
runs-on: [single-gpu, nvidia-gpu, t4, ci]
env:
AWS_REGION: us-east-1

steps:
- name: Checkout
uses: actions/checkout@v2
- name: Build image
run: |
docker build -f tests/gptq/Dockerfile_quantization_gpu -t gptq-gpu .
- name: Test with unittest within docker container
run: |
docker run --rm --gpus all -v $(pwd)/hf_cache:/root/.cache/huggingface --workdir=/workspace/optimum/tests gptq-gpu:latest
- name: Checkout code
uses: actions/checkout@v4

- name: Run tests
uses: addnab/docker-run-action@v3
with:
image: pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime
# latest auto-gptq was built with pytorch 2.2 and cuda 12.1
options: |
--rm
--gpus all
--shm-size 16G
--env RUN_SLOW=1
--env HF_HOME=/mnt/cache/
--volume /mnt/cache/:/mnt/cache/
--volume ${{ github.workspace }}:/workspace
--workdir /workspace
run: |
pip install auto-gptq
pip install -e .[tests]
pytest tests/gptq -s -vvvv --durations=0
2 changes: 1 addition & 1 deletion .github/workflows/test_onnx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install .[tests,onnxruntime,exporters-tf]
pip install .[tests,exporters]
- name: Test with unittest
working-directory: tests
run: |
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,7 @@ It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO
optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
```

If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#weight-only-quantization) for more detail on weight only quantization. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#static-quantization).

If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).

To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.

Expand All @@ -92,13 +91,13 @@ To load a model and run inference with OpenVINO Runtime, you can just replace yo
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained("distilbert_sst2_ov")
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
results = classifier("He's a dreadful magician.")
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).

### Neural Compressor

Expand Down
2 changes: 1 addition & 1 deletion docs/source/exporters/onnx/usage_guides/export_a_model.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Required arguments:
output Path indicating the directory where to store generated ONNX model.

Optional arguments:
--task TASK The task to export the model for. If not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, but are among: ['default', 'fill-mask', 'text-generation', 'text2text-generation', 'text-classification', 'token-classification', 'multiple-choice', 'object-detection', 'question-answering', 'image-classification', 'image-segmentation', 'masked-im', 'semantic-segmentation', 'automatic-speech-recognition', 'audio-classification', 'audio-frame-classification', 'automatic-speech-recognition', 'audio-xvector', 'image-to-text', 'stable-diffusion', 'zero-shot-object-detection']. For decoder models, use `xxx-with-past` to export the model using past key values in the decoder.
--task TASK The task to export the model for. If not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, but are among: ['default', 'fill-mask', 'text-generation', 'text2text-generation', 'text-classification', 'token-classification', 'multiple-choice', 'object-detection', 'question-answering', 'image-classification', 'image-segmentation', 'masked-im', 'semantic-segmentation', 'automatic-speech-recognition', 'audio-classification', 'audio-frame-classification', 'automatic-speech-recognition', 'audio-xvector', 'image-to-text', 'zero-shot-object-detection', 'image-to-image', 'inpainting', 'text-to-image']. For decoder models, use `xxx-with-past` to export the model using past key values in the decoder.
--monolith Force to export the model as a single ONNX file. By default, the ONNX exporter may break the model in several ONNX files, for example for encoder-decoder models where the encoder should be run only once while the decoder is looped over.
--device DEVICE The device to use to do the export. Defaults to "cpu".
--opset OPSET If specified, ONNX opset version to export the model with. Otherwise, the default opset will be used.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Optional arguments:
the model, but are among: ['default', 'fill-mask', 'text-generation', 'text2text-generation', 'text-classification', 'token-classification',
'multiple-choice', 'object-detection', 'question-answering', 'image-classification', 'image-segmentation', 'masked-im', 'semantic-
segmentation', 'automatic-speech-recognition', 'audio-classification', 'audio-frame-classification', 'automatic-speech-recognition', 'audio-xvector', 'vision2seq-
lm', 'stable-diffusion', 'zero-shot-object-detection']. For decoder models, use `xxx-with-past` to export the model using past key
lm', 'zero-shot-object-detection', 'text-to-image', 'image-to-image', 'inpainting']. For decoder models, use `xxx-with-past` to export the model using past key
values in the decoder.
--atol ATOL If specified, the absolute difference tolerance when validating the model. Otherwise, the default atol for the model will be used.
--pad_token_id PAD_TOKEN_ID
Expand Down
19 changes: 11 additions & 8 deletions examples/onnxruntime/optimization/multiple-choice/run_swag.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

from optimum.onnxruntime import ORTModelForMultipleChoice, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.onnxruntime.model import ORTModel
from optimum.onnxruntime.utils import evaluation_loop


# Will error if the minimal version of Transformers is not installed. The version of transformers must be >= 4.19.0
Expand Down Expand Up @@ -236,7 +236,6 @@ def main():
)

os.makedirs(training_args.output_dir, exist_ok=True)
optimized_model_path = os.path.join(training_args.output_dir, "model_optimized.onnx")

tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name or model_args.model_name_or_path)

Expand All @@ -254,13 +253,18 @@ def main():
optimizer = ORTOptimizer.from_pretrained(model)

# Optimize the model
optimizer.optimize(
optimized_model_path = optimizer.optimize(
optimization_config=optimization_config,
save_dir=training_args.output_dir,
use_external_data_format=onnx_export_args.use_external_data_format,
one_external_file=onnx_export_args.one_external_file,
)

model = ORTModelForMultipleChoice.from_pretrained(
optimized_model_path,
provider=optim_args.execution_provider,
)

if training_args.do_eval:
# Prepare the dataset downloading, preprocessing and metric creation to perform the evaluation and / or the
# prediction step(s)
Expand Down Expand Up @@ -339,13 +343,12 @@ def compute_metrics(eval_predictions):
# Evaluation
logger.info("*** Evaluate ***")

ort_model = ORTModel(
optimized_model_path,
execution_provider=optim_args.execution_provider,
compute_metrics=compute_metrics,
outputs = evaluation_loop(
model=model,
dataset=eval_dataset,
label_names=["label"],
compute_metrics=compute_metrics,
)
outputs = ort_model.evaluation_loop(eval_dataset)

# Save evaluation metrics
with open(os.path.join(training_args.output_dir, "eval_results.json"), "w") as f:
Expand Down
24 changes: 12 additions & 12 deletions examples/onnxruntime/optimization/question-answering/run_qa.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

from optimum.onnxruntime import ORTModelForQuestionAnswering, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.onnxruntime.model import ORTModel
from optimum.onnxruntime.utils import evaluation_loop


# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
Expand Down Expand Up @@ -305,7 +305,6 @@ def main():
)

os.makedirs(training_args.output_dir, exist_ok=True)
optimized_model_path = os.path.join(training_args.output_dir, "model_optimized.onnx")

tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name or model_args.model_name_or_path)

Expand All @@ -323,13 +322,15 @@ def main():
optimizer = ORTOptimizer.from_pretrained(model)

# Optimize the model
optimizer.optimize(
optimized_model_path = optimizer.optimize(
optimization_config=optimization_config,
save_dir=training_args.output_dir,
use_external_data_format=onnx_export_args.use_external_data_format,
one_external_file=onnx_export_args.one_external_file,
)

model = ORTModelForQuestionAnswering.from_pretrained(optimized_model_path, provider=optim_args.execution_provider)

# Prepare the dataset downloading, preprocessing and metric creation to perform the evaluation and / or the
# prediction step(s)
if training_args.do_eval or training_args.do_predict:
Expand Down Expand Up @@ -478,13 +479,12 @@ def compute_metrics(p: EvalPrediction):
# During Feature creation dataset samples might increase, we will select required samples again
eval_dataset = eval_dataset.select(range(data_args.max_eval_samples))

ort_model = ORTModel(
optimized_model_path,
execution_provider=optim_args.execution_provider,
compute_metrics=compute_metrics,
outputs = evaluation_loop(
model=model,
dataset=eval_dataset,
label_names=["start_positions", "end_positions"],
compute_metrics=compute_metrics,
)
outputs = ort_model.evaluation_loop(eval_dataset)
predictions = post_processing_function(eval_examples, eval_dataset, outputs.predictions)
metrics = compute_metrics(predictions)

Expand Down Expand Up @@ -514,12 +514,12 @@ def compute_metrics(p: EvalPrediction):
# During Feature creation dataset samples might increase, we will select required samples again
predict_dataset = predict_dataset.select(range(data_args.max_predict_samples))

ort_model = ORTModel(
optimized_model_path,
execution_provider=optim_args.execution_provider,
outputs = evaluation_loop(
model=model,
dataset=eval_dataset,
label_names=["start_positions", "end_positions"],
compute_metrics=compute_metrics,
)
outputs = ort_model.evaluation_loop(predict_dataset)
predictions = post_processing_function(predict_examples, predict_dataset, outputs.predictions)
metrics = compute_metrics(predictions)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,11 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

# Text classification
# Text classification

## GLUE tasks

The script [`run_glue.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/optimization/text-classification/run_glue.py)
allows us to apply graph optimizations and fusion using [ONNX Runtime](https://github.com/microsoft/onnxruntime) for sequence classification tasks such as
the ones from the [GLUE benchmark](https://gluebenchmark.com/).
The script [`run_glue.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/optimization/text-classification/run_glue.py) allows us to apply graph optimizations and fusion using [ONNX Runtime](https://github.com/microsoft/onnxruntime) for sequence classification tasks such as the ones from the [GLUE benchmark](https://gluebenchmark.com/).

The following example applies graph optimization on a DistilBERT fine-tuned on the sst-2 task. Here the optimization level is selected to be 1, enabling basic optimizations such as redundant node eliminations and constant folding. Higher optimization level will result in hardware dependent optimized graph.

Expand Down
25 changes: 15 additions & 10 deletions examples/onnxruntime/optimization/text-classification/run_glue.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@

from optimum.onnxruntime import ORTModelForSequenceClassification, ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig
from optimum.onnxruntime.model import ORTModel
from optimum.onnxruntime.utils import evaluation_loop


# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
Expand Down Expand Up @@ -250,7 +250,6 @@ def main():
)

os.makedirs(training_args.output_dir, exist_ok=True)
optimized_model_path = os.path.join(training_args.output_dir, "model_optimized.onnx")

tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path)

Expand All @@ -268,13 +267,17 @@ def main():
optimizer = ORTOptimizer.from_pretrained(model)

# Optimize the model
optimizer.optimize(
optimized_model_path = optimizer.optimize(
optimization_config=optimization_config,
save_dir=training_args.output_dir,
use_external_data_format=onnx_export_args.use_external_data_format,
one_external_file=onnx_export_args.one_external_file,
)

model = ORTModelForSequenceClassification.from_pretrained(
optimized_model_path, provider=optim_args.execution_provider
)

# Prepare the dataset downloading, preprocessing and metric creation to perform the evaluation and / or the
# prediction step(s)
if training_args.do_eval or training_args.do_predict:
Expand Down Expand Up @@ -408,13 +411,13 @@ def compute_metrics(p: EvalPrediction):
desc="Running tokenizer on the evaluation dataset",
)

ort_model = ORTModel(
optimized_model_path,
execution_provider=optim_args.execution_provider,
outputs = evaluation_loop(
model=model,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
label_names=["label"],
)
outputs = ort_model.evaluation_loop(eval_dataset)

# Save metrics
with open(os.path.join(training_args.output_dir, "eval_results.json"), "w") as f:
json.dump(outputs.metrics, f, indent=4, sort_keys=True)
Expand All @@ -436,10 +439,12 @@ def compute_metrics(p: EvalPrediction):
desc="Running tokenizer on the test dataset",
)

ort_model = ORTModel(
optimized_model_path, execution_provider=optim_args.execution_provider, label_names=["label"]
outputs = evaluation_loop(
model=model,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
label_names=["label"],
)
outputs = ort_model.evaluation_loop(predict_dataset)
predictions = np.squeeze(outputs.predictions) if is_regression else np.argmax(outputs.predictions, axis=1)

# Save predictions
Expand Down
Loading

0 comments on commit ac533ca

Please sign in to comment.