Adds the container package for ResNext 32x16d inference (PyTorch SPR) (…

…#91) * Add dockerfile and documentation for ResNext101 * Update echos * Small doc updates * add shm-size * add shm-size for maskrcnn * Update to use env vars in build.sh * Updating name for ResNext to be 'ResNext 32x16d' * Update quickstart scripts * Regenerate dockerfile after sync with develop * Regenerate docs
intel · Aug 25, 2021 · 5f32ec4 · 5f32ec4
1 parent 97aa091
commit 5f32ec4
Show file tree

Hide file tree

Showing 20 changed files with 678 additions and 0 deletions.
diff --git a/dockerfiles/pytorch/pytorch-spr-resnext-32x16d-inference.Dockerfile b/dockerfiles/pytorch/pytorch-spr-resnext-32x16d-inference.Dockerfile
@@ -0,0 +1,93 @@
+# Copyright (c) 2020-2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+#
+# THIS IS A GENERATED DOCKERFILE.
+#
+# This file was assembled from multiple pieces, whose use is documented
+# throughout. Please refer to the TensorFlow dockerfiles documentation
+# for more information.
+
+ARG PYTORCH_IMAGE="model-zoo"
+ARG PYTORCH_TAG="pytorch-ipex-spr"
+
+FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch
+
+RUN yum --enablerepo=extras install -y epel-release && \
+    yum install -y \
+    ca-certificates \
+    git \
+    wget \
+    make \
+    cmake \
+    gcc-c++ \
+    gcc \
+    autoconf \
+    bzip2 \
+    tar
+
+# Build Torch Vision
+ARG TORCHVISION_VERSION=v0.8.0
+
+RUN source activate pytorch && \
+    git clone https://github.com/pytorch/vision && \
+    cd vision && \
+    git checkout ${TORCHVISION_VERSION} && \
+    python setup.py install
+
+RUN source activate pytorch && \
+    pip install matplotlib Pillow pycocotools && \
+    pip install yacs opencv-python cityscapesscripts transformers && \
+    conda install -y libopenblas && \
+    mkdir -p /workspace/installs && \
+    cd /workspace/installs && \
+    wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \
+    tar -xzf gperftools-2.7.90.tar.gz && \
+    cd gperftools-2.7.90 && \
+    ./configure --prefix=$HOME/.local && \
+    make && \
+    make install && \
+    rm -rf /workspace/installs/
+
+ARG PACKAGE_DIR=model_packages
+
+ARG PACKAGE_NAME="pytorch-spr-resnext-32x16d-inference"
+
+ARG MODEL_WORKSPACE
+
+# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID
+# this allows the default user (root) to work in k8s single-node, multi-node
+RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE}
+
+ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE}
+
+RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x 
+
+WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME}
+
+FROM intel-optimized-pytorch AS release
+COPY --from=intel-optimized-pytorch /root/conda /root/conda
+COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/
+COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/
+
+ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX"
+
+ENV PATH="~/conda/bin:${PATH}"
+ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD"
+ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
+ENV BASH_ENV=/root/.bash_profile
+WORKDIR /workspace/
+RUN yum install -y numactl mesa-libGL && \
+    yum clean all && \
+    echo "source activate pytorch" >> /root/.bash_profile
diff --git a/quickstart/image_recognition/pytorch/resnet50/inference/cpu/run.sh b/quickstart/image_recognition/pytorch/resnet50/inference/cpu/run.sh
@@ -56,6 +56,7 @@ docker run --rm \
   --env no_proxy=${no_proxy} \
   ${dataset_volume} \
   --volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
+  --shm-size 8G \
   -w ${WORKDIR} \
   ${DOCKER_ARGS} \
   $IMAGE_NAME \

diff --git a/quickstart/image_recognition/pytorch/resnet50/training/cpu/run.sh b/quickstart/image_recognition/pytorch/resnet50/training/cpu/run.sh
@@ -51,6 +51,7 @@ docker run --rm \
   --env no_proxy=${no_proxy} \
   --volume ${DATASET_DIR}:${DATASET_DIR} \
   --volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
+  --shm-size 8G \
   -w ${WORKDIR} \
   ${DOCKER_ARGS} \
   $IMAGE_NAME \

diff --git a/...image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/container_build.md b/...image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/container_build.md
@@ -0,0 +1,26 @@
+## Build the container
+
+The <model name> <mode> package has scripts and a Dockerfile that are
+used to build a workload container that runs the model. This container
+uses the PyTorch/IPEX container as it's base, so ensure that you have built
+the `pytorch-ipex-spr.tar.gz` container prior to building this model container.
+
+Use `docker images` to verify that you have the base container built. For example:
+```
+$ docker images | grep pytorch-ipex-spr
+model-zoo         pytorch-ipex-spr         f5b473554295        2 hours ago         4.08GB
+```
+
+To build the <model name> <mode> container, extract the package and
+run the `build.sh` script.
+```
+# Extract the package
+tar -xzf <package name>
+cd <package dir>
+
+# Build the container
+./build.sh
+```
+
+After the build completes, you should have a container called
+`<docker image>` that will be used to run the model.
diff --git a/...kstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/datasets.md b/...kstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/datasets.md
@@ -0,0 +1,27 @@
+## Datasets
+
+### ImageNet
+
+The [ImageNet](http://www.image-net.org/) validation dataset is used to run the
+<model name> accuracy script. The realtime and throughput inference scripts use
+synthetic data.
+
+Download and extract the ImageNet2012 dataset from [http://www.image-net.org/](http://www.image-net.org/),
+then move validation images to labeled subfolders, using
+[the valprep.sh shell script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)
+
+A after running the data prep script, your folder structure should look something like this:
+```
+imagenet
+└── val
+    ├── ILSVRC2012_img_val.tar
+    ├── n01440764
+    │   ├── ILSVRC2012_val_00000293.JPEG
+    │   ├── ILSVRC2012_val_00002138.JPEG
+    │   ├── ILSVRC2012_val_00003014.JPEG
+    │   ├── ILSVRC2012_val_00006697.JPEG
+    │   └── ...
+    └── ...
+```
+The folder that contains the `val` directory should be set as the
+`DATASET_DIR` (for example: `export DATASET_DIR=/home/<user>/imagenet`).
diff --git a/...art/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/description.md b/...art/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/description.md
@@ -0,0 +1,5 @@
+<!-- 10. Description -->
+## Description
+
+This document has instructions for running <model name> <mode> using
+Intel-optimized PyTorch.
diff --git a/...tart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/docker_spr.md b/...tart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/docker_spr.md
@@ -0,0 +1,24 @@
+## Run the model
+
+After you've followed the instructions to [build the container](#build-the-container)
+and [prepare the dataset](#datasets), use the `run.sh` script from the container
+package to run <model name> <mode> in docker. Set environment variables to
+specify the dataset directory (only for accuracy), precision to run, and
+an output directory. By default, the `run.sh` script will run the
+`inference_realtime.sh` quickstart script. To run a different script, specify
+the name of the script using the `SCRIPT` environment variable.
+```
+# Navigate to the container package directory
+cd <package dir>
+
+# Set the required environment vars
+export PRECISION=<specify the precision to run>
+export OUTPUT_DIR=<directory where log files will be written>
+
+# Run the container with inference_realtime.sh quickstart script
+./run.sh
+
+# To test accuracy, also specify the dataset directory
+export DATASET_DIR=<path to the dataset>
+SCRIPT=accuracy.sh ./run.sh
+```
diff --git a/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/license.md b/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/license.md
@@ -0,0 +1,4 @@
+<!--- 80. License -->
+## License
+
+Licenses can be found in the model package, in the `licenses` directory.
diff --git a/...tart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/quickstart.md b/...tart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/quickstart.md
@@ -0,0 +1,8 @@
+<!--- 40. Quick Start Scripts -->
+## Quick Start Scripts
+
+| Script name | Description |
+|-------------|-------------|
+| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance with synthetic data for the specified precision (fp32, int8 or bf16). |
+| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket with synthetic data for the specified precision (fp32, int8 or bf16). |
+| `accuracy.sh` | Measures the inference accuracy (providing a `DATASET_DIR` environment variable is required) for the specified precision (fp32, int8 or bf16). |
diff --git a/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/title.md b/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/title.md
@@ -0,0 +1,2 @@
+<!--- 0. Title -->
+# PyTorch <model name> <mode>
diff --git a/...image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/wrapper_package.md b/...image_recognition/pytorch/resnext-32x16d/inference/cpu/.docs/wrapper_package.md
@@ -0,0 +1,16 @@
+## Model Package
+
+The model package includes the Dockerfile and scripts needed to build and
+run <model name> <mode> in a container.
+```
+<package dir>
+├── README.md
+├── build.sh
+├── licenses
+│   ├── LICENSE
+│   └── third_party
+├── model_packages
+│   └── <package name>
+├── <package dir>.Dockerfile
+└── run.sh
+```
diff --git a/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/README_SPR.md b/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu/README_SPR.md
@@ -0,0 +1,120 @@
+<!--- 0. Title -->
+# PyTorch ResNext 32x16d inference
+
+<!-- 10. Description -->
+## Description
+
+This document has instructions for running ResNext 32x16d inference using
+Intel-optimized PyTorch.
+
+## Model Package
+
+The model package includes the Dockerfile and scripts needed to build and
+run ResNext 32x16d inference in a container.
+```
+pytorch-spr-resnext-32x16d-inference
+├── README.md
+├── build.sh
+├── licenses
+│   ├── LICENSE
+│   └── third_party
+├── model_packages
+│   └── pytorch-spr-resnext-32x16d-inference.tar.gz
+├── pytorch-spr-resnext-32x16d-inference.Dockerfile
+└── run.sh
+```
+
+<!--- 40. Quick Start Scripts -->
+## Quick Start Scripts
+
+| Script name | Description |
+|-------------|-------------|
+| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance with synthetic data for the specified precision (fp32, int8 or bf16). |
+| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket with synthetic data for the specified precision (fp32, int8 or bf16). |
+| `accuracy.sh` | Measures the inference accuracy (providing a `DATASET_DIR` environment variable is required) for the specified precision (fp32, int8 or bf16). |
+
+## Datasets
+
+### ImageNet
+
+The [ImageNet](http://www.image-net.org/) validation dataset is used to run the
+ResNext 32x16d accuracy script. The realtime and throughput inference scripts use
+synthetic data.
+
+Download and extract the ImageNet2012 dataset from [http://www.image-net.org/](http://www.image-net.org/),
+then move validation images to labeled subfolders, using
+[the valprep.sh shell script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)
+
+A after running the data prep script, your folder structure should look something like this:
+```
+imagenet
+└── val
+    ├── ILSVRC2012_img_val.tar
+    ├── n01440764
+    │   ├── ILSVRC2012_val_00000293.JPEG
+    │   ├── ILSVRC2012_val_00002138.JPEG
+    │   ├── ILSVRC2012_val_00003014.JPEG
+    │   ├── ILSVRC2012_val_00006697.JPEG
+    │   └── ...
+    └── ...
+```
+The folder that contains the `val` directory should be set as the
+`DATASET_DIR` (for example: `export DATASET_DIR=/home/<user>/imagenet`).
+
+## Build the container
+
+The ResNext 32x16d inference package has scripts and a Dockerfile that are
+used to build a workload container that runs the model. This container
+uses the PyTorch/IPEX container as it's base, so ensure that you have built
+the `pytorch-ipex-spr.tar.gz` container prior to building this model container.
+
+Use `docker images` to verify that you have the base container built. For example:
+```
+$ docker images | grep pytorch-ipex-spr
+model-zoo         pytorch-ipex-spr         f5b473554295        2 hours ago         4.08GB
+```
+
+To build the ResNext 32x16d inference container, extract the package and
+run the `build.sh` script.
+```
+# Extract the package
+tar -xzf pytorch-spr-resnext-32x16d-inference.tar.gz
+cd pytorch-spr-resnext-32x16d-inference
+
+# Build the container
+./build.sh
+```
+
+After the build completes, you should have a container called
+`model-zoo:pytorch-spr-resnext-32x16d-inference` that will be used to run the model.
+
+## Run the model
+
+After you've followed the instructions to [build the container](#build-the-container)
+and [prepare the dataset](#datasets), use the `run.sh` script from the container
+package to run ResNext 32x16d inference in docker. Set environment variables to
+specify the dataset directory (only for accuracy), precision to run, and
+an output directory. By default, the `run.sh` script will run the
+`inference_realtime.sh` quickstart script. To run a different script, specify
+the name of the script using the `SCRIPT` environment variable.
+```
+# Navigate to the container package directory
+cd pytorch-spr-resnext-32x16d-inference
+
+# Set the required environment vars
+export PRECISION=<specify the precision to run>
+export OUTPUT_DIR=<directory where log files will be written>
+
+# Run the container with inference_realtime.sh quickstart script
+./run.sh
+
+# To test accuracy, also specify the dataset directory
+export DATASET_DIR=<path to the dataset>
+SCRIPT=accuracy.sh ./run.sh
+```
+
+<!--- 80. License -->
+## License
+
+Licenses can be found in the model package, in the `licenses` directory.
+