Merge branch 'main' into update-ort-trainer-to-4.32

huggingface · Oct 12, 2023 · a6eef26 · a6eef26
2 parents da18b6e + 0153306
commit a6eef26
Show file tree

Hide file tree

Showing 49 changed files with 1,494 additions and 479 deletions.
diff --git a/docs/source/exporters/onnx/overview.mdx b/docs/source/exporters/onnx/overview.mdx
@@ -60,6 +60,7 @@ Supported architectures:
 - M2-M100
 - Marian
 - MBart
+- Mistral
 - MobileBert
 - MobileVit
 - MobileNet v1

diff --git a/docs/source/exporters/onnx/package_reference/configuration.mdx b/docs/source/exporters/onnx/package_reference/configuration.mdx
@@ -38,7 +38,7 @@ They specify which input generators should be used for the dummy inputs, but rem
     - generate_dummy_inputs
 
 [[autodoc]] exporters.onnx.OnnxConfigWithPast
-    - with_past
+    - add_past_key_values
 
 [[autodoc]] exporters.onnx.OnnxSeq2SeqConfigWithPast
 

diff --git a/docs/source/index.mdx b/docs/source/index.mdx
@@ -23,27 +23,27 @@ As such, Optimum enables developers to efficiently use any of these platforms wi
   <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
     <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./habana/index"
       ><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Habana</div>
-      <p class="text-gray-700">Maximize training throughput and efficiency with <a href="https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html">Habana's Gaudi processor</a></p>
+      <p class="text-gray-700">Maximize training throughput and efficiency with <span class="underline" onclick="event.preventDefault(); window.open('https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Architecture.html', '_blank');">Habana's Gaudi processor</span></p>
     </a>
     <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./intel/index"
       ><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Intel</div>
-      <p class="text-gray-700">Optimize your model to speedup inference with <a href="https://docs.openvino.ai/latest/index.html">OpenVINO</a> and <a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html">Neural Compressor</a></p>
+      <p class="text-gray-700">Optimize your model to speedup inference with <span class="underline" onclick="event.preventDefault(); window.open('https://docs.openvino.ai/latest/index.html', '_blank');">OpenVINO</span> and <span class="underline" onclick="event.preventDefault(); window.open('https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html', '_blank');">Neural Compressor</span></p>
     </a>
     <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/docs/optimum-neuron/index"
       ><div class="w-full text-center bg-gradient-to-br from-orange-400 to-orange-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">AWS Trainium/Inferentia</div>
-      <p class="text-gray-700">Accelerate your training and inference workflows with <a href="https://aws.amazon.com/machine-learning/trainium/">AWS Trainium</a> and <a href="https://aws.amazon.com/machine-learning/inferentia/">AWS Inferentia</a></p>
+      <p class="text-gray-700">Accelerate your training and inference workflows with <span class="underline" onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/trainium/', '_blank');">AWS Trainium</span> and <span class="underline" onclick="event.preventDefault(); window.open('https://aws.amazon.com/machine-learning/inferentia/', '_blank');">AWS Inferentia</span></p>
     </a>
     <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./furiosa/index"
       ><div class="w-full text-center bg-gradient-to-br from-green-400 to-green-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">FuriosaAI</div>
-      <p class="text-gray-700">Fast and efficient inference on <a href="https://www.furiosa.ai/">FuriosaAI WARBOY</a></p>
+      <p class="text-gray-700">Fast and efficient inference on <span class="underline" onclick="event.preventDefault(); window.open('https://www.furiosa.ai/', '_blank');">FuriosaAI WARBOY</span></p>
     </a>
     <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./onnxruntime/overview"
       ><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">ONNX Runtime</div>
-      <p class="text-gray-700">Apply quantization and graph optimization to accelerate Transformers models training and inference with <a href="https://onnxruntime.ai/">ONNX Runtime</a></p>
+      <p class="text-gray-700">Apply quantization and graph optimization to accelerate Transformers models training and inference with <span class="underline" onclick="event.preventDefault(); window.open('https://onnxruntime.ai/', '_blank');">ONNX Runtime</span></p>
     </a>
     <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./bettertransformer/overview"
       ><div class="w-full text-center bg-gradient-to-br from-yellow-400 to-yellow-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">BetterTransformer</div>
-      <p class="text-gray-700">A one-liner integration to use <a href="https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/">PyTorch's BetterTransformer</a> with Transformers models</p>
+      <p class="text-gray-700">A one-liner integration to use <span class="underline" onclick="event.preventDefault(); window.open('https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/', '_blank');">PyTorch's BetterTransformer</span> with Transformers models</p>
     </a>
   </div>
 </div>
diff --git a/examples/onnxruntime/training/docker/Dockerfile-ort-nightly-cu118 b/examples/onnxruntime/training/docker/Dockerfile-ort-nightly-cu118
@@ -57,7 +57,7 @@ ARG PYTHON_EXE=$MINICONDA_PREFIX/bin/python
 # (Optional) Intall test dependencies
 RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers
 RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
-RUN $PYTHON_EXE -m pip install fairscale deepspeed mpi4py
+RUN $PYTHON_EXE -m pip install deepspeed mpi4py
 # RUN $PYTHON_EXE -m pip install optuna ray sigopt wandb
 
 # PyTorch

diff --git a/examples/onnxruntime/training/docker/Dockerfile-ort1.13.1-cu116 b/examples/onnxruntime/training/docker/Dockerfile-ort1.13.1-cu116
diff --git a/examples/onnxruntime/training/docker/Dockerfile-ort1.14.1-cu116 b/examples/onnxruntime/training/docker/Dockerfile-ort1.14.1-cu116
@@ -48,7 +48,7 @@ RUN pip install pygit2 pgzip
 # (Optional) Intall test dependencies
 RUN pip install git+https://github.com/huggingface/transformers
 RUN pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
-RUN pip install fairscale deepspeed mpi4py
+RUN pip install deepspeed mpi4py
 # RUN pip install optuna ray sigopt wandb
 
 # Install onnxruntime-training dependencies

diff --git a/examples/onnxruntime/training/docker/Dockerfile-ort1.15.1-cu118 b/examples/onnxruntime/training/docker/Dockerfile-ort1.15.1-cu118
@@ -57,7 +57,7 @@ ARG PYTHON_EXE=$MINICONDA_PREFIX/bin/python
 # (Optional) Intall test dependencies
 RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers
 RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
-RUN $PYTHON_EXE -m pip install fairscale deepspeed mpi4py
+RUN $PYTHON_EXE -m pip install deepspeed mpi4py
 # RUN $PYTHON_EXE -m pip install optuna ray sigopt wandb
 
 # PyTorch

diff --git a/examples/onnxruntime/training/docker/Dockerfile-ort1.16.1-cu118 b/examples/onnxruntime/training/docker/Dockerfile-ort1.16.1-cu118
@@ -0,0 +1,76 @@
+#!/usr/bin/env python
+# coding=utf-8
+# Copyright 2023 The HuggingFace Team All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Use nvidia/cuda image
+FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
+CMD nvidia-smi
+
+# Ignore interactive questions during `docker build`
+ENV DEBIAN_FRONTEND noninteractive
+
+# Versions
+ARG PYTHON_VERSION=3.10
+ARG TORCH_CUDA_VERSION=cu118
+ARG TORCH_VERSION=2.0.0
+ARG TORCHVISION_VERSION=0.15.1
+
+# Bash shell
+RUN chsh -s /bin/bash
+SHELL ["/bin/bash", "-c"]
+
+# Install and update tools to minimize security vulnerabilities
+RUN apt-get update
+RUN apt-get install -y software-properties-common wget apt-utils patchelf git libprotobuf-dev protobuf-compiler cmake \
+    bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 mercurial subversion libopenmpi-dev && \
+    apt-get clean
+RUN unattended-upgrade
+RUN apt-get autoremove -y
+
+# Install miniconda (comes with python 3.9 default)
+ARG BUILD_USER=onnxruntimedev
+ARG MINICONDA_PREFIX=/home/$BUILD_USER/miniconda3
+RUN apt-get install curl
+
+ARG CONDA_URL=https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh
+RUN curl -fSsL --insecure ${CONDA_URL} -o install-conda.sh && \
+    /bin/bash ./install-conda.sh -b -p $MINICONDA_PREFIX && \
+    $MINICONDA_PREFIX/bin/conda clean -ya && \
+    $MINICONDA_PREFIX/bin/conda install -y python=${PYTHON_VERSION}
+
+ENV PATH=$MINICONDA_PREFIX/bin:${PATH}
+
+ARG PYTHON_EXE=$MINICONDA_PREFIX/bin/python
+
+# (Optional) Intall test dependencies
+RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers
+RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece
+RUN $PYTHON_EXE -m pip install deepspeed mpi4py
+# RUN $PYTHON_EXE -m pip install optuna ray sigopt wandb
+
+# PyTorch
+RUN $PYTHON_EXE -m pip install onnx ninja
+RUN $PYTHON_EXE -m pip install torch==${TORCH_VERSION} torchvision==${TORCHVISION_VERSION} -f https://download.pytorch.org/whl/${TORCH_CUDA_VERSION}
+
+# ORT Module
+RUN $PYTHON_EXE -m pip install onnxruntime-training==1.16.1 -f https://download.onnxruntime.ai/onnxruntime_stable_cu118.html
+RUN $PYTHON_EXE -m pip install torch-ort
+ENV TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX"
+RUN $PYTHON_EXE -m pip install --upgrade protobuf==3.20.2
+RUN $PYTHON_EXE -m torch_ort.configure
+
+WORKDIR .
+
+CMD ["/bin/bash"]
diff --git a/examples/onnxruntime/training/text-classification/README.md b/examples/onnxruntime/training/text-classification/README.md
@@ -16,6 +16,60 @@ limitations under the License.
 
 # Text classification
 
+By running the script [`run_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_classification.py),
+we will be able to leverage the [`ONNX Runtime`](https://github.com/microsoft/onnxruntime) accelerator to fine-tune the models from the
+[HuggingFace hub](https://huggingface.co/models) for text classification task.
+
+
+__The following example applies the acceleration features powered by ONNX Runtime.__
+
+
+### ONNX Runtime Training
+
+The following example fine-tunes [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the [Amazon Reviews Dataset](https://huggingface.co/datasets/amazon_reviews_multi).
+
+```bash
+torchrun --nproc_per_node=NUM_GPUS_YOU_HAVE run_classification.py \
+    --model_name_or_path meta-llama/Llama-2-7b-hf \
+    --dataset_name amazon_reviews_multi \
+    --dataset_config_name en \
+    --shuffle_train_dataset \
+    --metric_name accuracy \
+    --text_column_name 'review_title,review_body,product_category' \
+    --text_column_delimiter ' ' \
+    --label_column_name stars \
+    --do_train \
+    --do_eval \
+    --fp16 \
+    --max_seq_length 128 \
+    --per_device_train_batch_size 16 \
+    --learning_rate 2e-5 \
+    --num_train_epochs 1 \
+    --deepspeed zero_stage_2.json \
+    --use_peft \
+    --output_dir /tmp/ort-llama-2/
+```
+
+### Performance
+
+We get the following results for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) using mixed-precision-training/LoRA/ZeRO-Stage-2 under PyTorch and ONNX Runtime backends. 8 Nvidia V100 cards were used to run the
+experiment for 10 epochs:
+
+| Model                       | Backend      | Runtime(s)      | Train samples(/s)   |
+| --------------------------- |------------- | --------------- | ------------------- |
+| meta-llama/Llama-2-7b-hf    | PyTorch      | 17035.9055      | 117.399             |
+| meta-llama/Llama-2-7b-hf    | ONNX Runtime | 15532.2403      | 128.764             |
+
+We observe the gain of ONNX Runtime compared to PyTorch as follow:
+
+| Model                     | Latency | Throughput |
+| ------------------------- | ------- | ---------- |
+| meta-llama/Llama-2-7b-hf  | 8.83%   | 9.68%      |
+
+#### DeepSpeed
+
+[zero_stage_2.json](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/zero_stage_2.json) is an example DeepSpeed config file to enable Stage-2 parameter sharing for training meta-llama/Llama-2-7b. More information can be found at [DeepSpeed's official repo](https://github.com/microsoft/DeepSpeed).
+
 ## GLUE Tasks
 
 By running the script [`run_glue.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_glue.py),
-Original file line number
+Diff line change
@@ Expand Up / @@ -60,6 +60,7 @@ Supported architectures: @@
     - M2-M100
     - Marian
     - MBart
+    - Mistral
     - MobileBert
     - MobileVit
     - MobileNet v1
@@ Expand Down @@