Skip to content

Commit

Permalink
Merge branch 'main' into update-ort-trainer-to-4.32
Browse files Browse the repository at this point in the history
  • Loading branch information
JingyaHuang committed Oct 12, 2023
2 parents da18b6e + 0153306 commit a6eef26
Show file tree
Hide file tree
Showing 49 changed files with 1,494 additions and 479 deletions.
1 change: 1 addition & 0 deletions docs/source/exporters/onnx/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Supported architectures:
- M2-M100
- Marian
- MBart
- Mistral
- MobileBert
- MobileVit
- MobileNet v1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ They specify which input generators should be used for the dummy inputs, but rem
- generate_dummy_inputs

[[autodoc]] exporters.onnx.OnnxConfigWithPast
- with_past
- add_past_key_values

[[autodoc]] exporters.onnx.OnnxSeq2SeqConfigWithPast

Expand Down
12 changes: 6 additions & 6 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,27 +23,27 @@ As such, Optimum enables developers to efficiently use any of these platforms wi
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./habana/index"
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Habana</div>
<p class="text-gray-700">Maximize training throughput and efficiency with <a href="https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html">Habana's Gaudi processor</a></p>
<p class="text-gray-700">Maximize training throughput and efficiency with <span class="underline" onclick="event.preventDefault(); window.open('https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html', '_blank');">Habana's Gaudi processor</span></p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./intel/index"
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Intel</div>
<p class="text-gray-700">Optimize your model to speedup inference with <a href="https://docs.openvino.ai/latest/index.html">OpenVINO</a> and <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html">Neural Compressor</a></p>
<p class="text-gray-700">Optimize your model to speedup inference with <span class="underline" onclick="event.preventDefault(); window.open('https://docs.openvino.ai/latest/index.html', '_blank');">OpenVINO</span> and <span class="underline" onclick="event.preventDefault(); window.open('https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html', '_blank');">Neural Compressor</span></p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/docs/optimum-neuron/index"
><div class="w-full text-center bg-gradient-to-br from-orange-400 to-orange-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">AWS Trainium/Inferentia</div>
<p class="text-gray-700">Accelerate your training and inference workflows with <a href="https://aws.amazon.com/machine-learning/trainium/">AWS Trainium</a> and <a href="https://aws.amazon.com/machine-learning/inferentia/">AWS Inferentia</a></p>
<p class="text-gray-700">Accelerate your training and inference workflows with <span class="underline" onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/trainium/', '_blank');">AWS Trainium</span> and <span class="underline" onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/inferentia/', '_blank');">AWS Inferentia</span></p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./furiosa/index"
><div class="w-full text-center bg-gradient-to-br from-green-400 to-green-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">FuriosaAI</div>
<p class="text-gray-700">Fast and efficient inference on <a href="https://www.furiosa.ai/">FuriosaAI WARBOY</a></p>
<p class="text-gray-700">Fast and efficient inference on <span class="underline" onclick="event.preventDefault(); window.open('https://www.furiosa.ai/', '_blank');">FuriosaAI WARBOY</span></p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./onnxruntime/overview"
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">ONNX Runtime</div>
<p class="text-gray-700">Apply quantization and graph optimization to accelerate Transformers models training and inference with <a href="https://onnxruntime.ai/">ONNX Runtime</a></p>
<p class="text-gray-700">Apply quantization and graph optimization to accelerate Transformers models training and inference with <span class="underline" onclick="event.preventDefault(); window.open('https://onnxruntime.ai/', '_blank');">ONNX Runtime</span></p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./bettertransformer/overview"
><div class="w-full text-center bg-gradient-to-br from-yellow-400 to-yellow-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">BetterTransformer</div>
<p class="text-gray-700">A one-liner integration to use <a href="https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/">PyTorch's BetterTransformer</a> with Transformers models</p>
<p class="text-gray-700">A one-liner integration to use <span class="underline" onclick="event.preventDefault(); window.open('https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/', '_blank');">PyTorch's BetterTransformer</span> with Transformers models</p>
</a>
</div>
</div>
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ ARG PYTHON_EXE=$MINICONDA_PREFIX/bin/python
# (Optional) Intall test dependencies
RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers
RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
RUN $PYTHON_EXE -m pip install fairscale deepspeed mpi4py
RUN $PYTHON_EXE -m pip install deepspeed mpi4py
# RUN $PYTHON_EXE -m pip install optuna ray sigopt wandb

# PyTorch
Expand Down
67 changes: 0 additions & 67 deletions examples/onnxruntime/training/docker/Dockerfile-ort1.13.1-cu116

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ RUN pip install pygit2 pgzip
# (Optional) Intall test dependencies
RUN pip install git+https://github.com/huggingface/transformers
RUN pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
RUN pip install fairscale deepspeed mpi4py
RUN pip install deepspeed mpi4py
# RUN pip install optuna ray sigopt wandb

# Install onnxruntime-training dependencies
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ ARG PYTHON_EXE=$MINICONDA_PREFIX/bin/python
# (Optional) Intall test dependencies
RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers
RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
RUN $PYTHON_EXE -m pip install fairscale deepspeed mpi4py
RUN $PYTHON_EXE -m pip install deepspeed mpi4py
# RUN $PYTHON_EXE -m pip install optuna ray sigopt wandb

# PyTorch
Expand Down
76 changes: 76 additions & 0 deletions examples/onnxruntime/training/docker/Dockerfile-ort1.16.1-cu118
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Team All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Use nvidia/cuda image
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
CMD nvidia-smi

# Ignore interactive questions during `docker build`
ENV DEBIAN_FRONTEND noninteractive

# Versions
ARG PYTHON_VERSION=3.10
ARG TORCH_CUDA_VERSION=cu118
ARG TORCH_VERSION=2.0.0
ARG TORCHVISION_VERSION=0.15.1

# Bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# Install and update tools to minimize security vulnerabilities
RUN apt-get update
RUN apt-get install -y software-properties-common wget apt-utils patchelf git libprotobuf-dev protobuf-compiler cmake \
bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 mercurial subversion libopenmpi-dev && \
apt-get clean
RUN unattended-upgrade
RUN apt-get autoremove -y

# Install miniconda (comes with python 3.9 default)
ARG BUILD_USER=onnxruntimedev
ARG MINICONDA_PREFIX=/home/$BUILD_USER/miniconda3
RUN apt-get install curl

ARG CONDA_URL=https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh
RUN curl -fSsL --insecure ${CONDA_URL} -o install-conda.sh && \
/bin/bash ./install-conda.sh -b -p $MINICONDA_PREFIX && \
$MINICONDA_PREFIX/bin/conda clean -ya && \
$MINICONDA_PREFIX/bin/conda install -y python=${PYTHON_VERSION}

ENV PATH=$MINICONDA_PREFIX/bin:${PATH}

ARG PYTHON_EXE=$MINICONDA_PREFIX/bin/python

# (Optional) Intall test dependencies
RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers
RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
RUN $PYTHON_EXE -m pip install deepspeed mpi4py
# RUN $PYTHON_EXE -m pip install optuna ray sigopt wandb

# PyTorch
RUN $PYTHON_EXE -m pip install onnx ninja
RUN $PYTHON_EXE -m pip install torch==${TORCH_VERSION} torchvision==${TORCHVISION_VERSION} -f https://download.pytorch.org/whl/${TORCH_CUDA_VERSION}

# ORT Module
RUN $PYTHON_EXE -m pip install onnxruntime-training==1.16.1 -f https://download.onnxruntime.ai/onnxruntime_stable_cu118.html
RUN $PYTHON_EXE -m pip install torch-ort
ENV TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX"
RUN $PYTHON_EXE -m pip install --upgrade protobuf==3.20.2
RUN $PYTHON_EXE -m torch_ort.configure

WORKDIR .

CMD ["/bin/bash"]
54 changes: 54 additions & 0 deletions examples/onnxruntime/training/text-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,60 @@ limitations under the License.

# Text classification

By running the script [`run_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_classification.py),
we will be able to leverage the [`ONNX Runtime`](https://github.com/microsoft/onnxruntime) accelerator to fine-tune the models from the
[HuggingFace hub](https://huggingface.co/models) for text classification task.


__The following example applies the acceleration features powered by ONNX Runtime.__


### ONNX Runtime Training

The following example fine-tunes [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the [Amazon Reviews Dataset](https://huggingface.co/datasets/amazon_reviews_multi).

```bash
torchrun --nproc_per_node=NUM_GPUS_YOU_HAVE run_classification.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--dataset_name amazon_reviews_multi \
--dataset_config_name en \
--shuffle_train_dataset \
--metric_name accuracy \
--text_column_name 'review_title,review_body,product_category' \
--text_column_delimiter ' ' \
--label_column_name stars \
--do_train \
--do_eval \
--fp16 \
--max_seq_length 128 \
--per_device_train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--deepspeed zero_stage_2.json \
--use_peft \
--output_dir /tmp/ort-llama-2/
```

### Performance

We get the following results for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) using mixed-precision-training/LoRA/ZeRO-Stage-2 under PyTorch and ONNX Runtime backends. 8 Nvidia V100 cards were used to run the
experiment for 10 epochs:

| Model | Backend | Runtime(s) | Train samples(/s) |
| --------------------------- |------------- | --------------- | ------------------- |
| meta-llama/Llama-2-7b-hf | PyTorch | 17035.9055 | 117.399 |
| meta-llama/Llama-2-7b-hf | ONNX Runtime | 15532.2403 | 128.764 |

We observe the gain of ONNX Runtime compared to PyTorch as follow:

| Model | Latency | Throughput |
| ------------------------- | ------- | ---------- |
| meta-llama/Llama-2-7b-hf | 8.83% | 9.68% |

#### DeepSpeed

[zero_stage_2.json](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/zero_stage_2.json) is an example DeepSpeed config file to enable Stage-2 parameter sharing for training meta-llama/Llama-2-7b. More information can be found at [DeepSpeed's official repo](https://github.com/microsoft/DeepSpeed).

## GLUE Tasks

By running the script [`run_glue.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_glue.py),
Expand Down
Loading

0 comments on commit a6eef26

Please sign in to comment.