Skip to content

Commit

Permalink
Adds the container package for ResNext 32x16d inference (PyTorch SPR) (
Browse files Browse the repository at this point in the history
…#91)

* Add dockerfile and documentation for ResNext101

* Update echos

* Small doc updates

* add shm-size

* add shm-size for maskrcnn

* Update to use env vars in build.sh

* Updating name for ResNext to be 'ResNext 32x16d'

* Update quickstart scripts

* Regenerate dockerfile after sync with develop

* Regenerate docs
  • Loading branch information
dmsuehir authored Aug 25, 2021
1 parent 97aa091 commit 5f32ec4
Show file tree
Hide file tree
Showing 20 changed files with 678 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Copyright (c) 2020-2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
#
# THIS IS A GENERATED DOCKERFILE.
#
# This file was assembled from multiple pieces, whose use is documented
# throughout. Please refer to the TensorFlow dockerfiles documentation
# for more information.

ARG PYTORCH_IMAGE="model-zoo"
ARG PYTORCH_TAG="pytorch-ipex-spr"

FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch

RUN yum --enablerepo=extras install -y epel-release && \
yum install -y \
ca-certificates \
git \
wget \
make \
cmake \
gcc-c++ \
gcc \
autoconf \
bzip2 \
tar

# Build Torch Vision
ARG TORCHVISION_VERSION=v0.8.0

RUN source activate pytorch && \
git clone https://github.com/pytorch/vision && \
cd vision && \
git checkout ${TORCHVISION_VERSION} && \
python setup.py install

RUN source activate pytorch && \
pip install matplotlib Pillow pycocotools && \
pip install yacs opencv-python cityscapesscripts transformers && \
conda install -y libopenblas && \
mkdir -p /workspace/installs && \
cd /workspace/installs && \
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \
tar -xzf gperftools-2.7.90.tar.gz && \
cd gperftools-2.7.90 && \
./configure --prefix=$HOME/.local && \
make && \
make install && \
rm -rf /workspace/installs/

ARG PACKAGE_DIR=model_packages

ARG PACKAGE_NAME="pytorch-spr-resnext-32x16d-inference"

ARG MODEL_WORKSPACE

# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID
# this allows the default user (root) to work in k8s single-node, multi-node
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE}

ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE}

RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x

WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME}

FROM intel-optimized-pytorch AS release
COPY --from=intel-optimized-pytorch /root/conda /root/conda
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/

ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX"

ENV PATH="~/conda/bin:${PATH}"
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD"
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
ENV BASH_ENV=/root/.bash_profile
WORKDIR /workspace/
RUN yum install -y numactl mesa-libGL && \
yum clean all && \
echo "source activate pytorch" >> /root/.bash_profile
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ docker run --rm \
--env no_proxy=${no_proxy} \
${dataset_volume} \
--volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
--shm-size 8G \
-w ${WORKDIR} \
${DOCKER_ARGS} \
$IMAGE_NAME \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ docker run --rm \
--env no_proxy=${no_proxy} \
--volume ${DATASET_DIR}:${DATASET_DIR} \
--volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
--shm-size 8G \
-w ${WORKDIR} \
${DOCKER_ARGS} \
$IMAGE_NAME \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Build the container

The <model name> <mode> package has scripts and a Dockerfile that are
used to build a workload container that runs the model. This container
uses the PyTorch/IPEX container as it's base, so ensure that you have built
the `pytorch-ipex-spr.tar.gz` container prior to building this model container.

Use `docker images` to verify that you have the base container built. For example:
```
$ docker images | grep pytorch-ipex-spr
model-zoo pytorch-ipex-spr f5b473554295 2 hours ago 4.08GB
```

To build the <model name> <mode> container, extract the package and
run the `build.sh` script.
```
# Extract the package
tar -xzf <package name>
cd <package dir>
# Build the container
./build.sh
```

After the build completes, you should have a container called
`<docker image>` that will be used to run the model.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
## Datasets

### ImageNet

The [ImageNet](http://www.image-net.org/) validation dataset is used to run the
<model name> accuracy script. The realtime and throughput inference scripts use
synthetic data.

Download and extract the ImageNet2012 dataset from [http://www.image-net.org/](http://www.image-net.org/),
then move validation images to labeled subfolders, using
[the valprep.sh shell script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)

A after running the data prep script, your folder structure should look something like this:
```
imagenet
└── val
├── ILSVRC2012_img_val.tar
├── n01440764
│ ├── ILSVRC2012_val_00000293.JPEG
│ ├── ILSVRC2012_val_00002138.JPEG
│ ├── ILSVRC2012_val_00003014.JPEG
│ ├── ILSVRC2012_val_00006697.JPEG
│ └── ...
└── ...
```
The folder that contains the `val` directory should be set as the
`DATASET_DIR` (for example: `export DATASET_DIR=/home/<user>/imagenet`).
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!-- 10. Description -->
## Description

This document has instructions for running <model name> <mode> using
Intel-optimized PyTorch.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Run the model

After you've followed the instructions to [build the container](#build-the-container)
and [prepare the dataset](#datasets), use the `run.sh` script from the container
package to run <model name> <mode> in docker. Set environment variables to
specify the dataset directory (only for accuracy), precision to run, and
an output directory. By default, the `run.sh` script will run the
`inference_realtime.sh` quickstart script. To run a different script, specify
the name of the script using the `SCRIPT` environment variable.
```
# Navigate to the container package directory
cd <package dir>
# Set the required environment vars
export PRECISION=<specify the precision to run>
export OUTPUT_DIR=<directory where log files will be written>
# Run the container with inference_realtime.sh quickstart script
./run.sh
# To test accuracy, also specify the dataset directory
export DATASET_DIR=<path to the dataset>
SCRIPT=accuracy.sh ./run.sh
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<!--- 80. License -->
## License

Licenses can be found in the model package, in the `licenses` directory.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<!--- 40. Quick Start Scripts -->
## Quick Start Scripts

| Script name | Description |
|-------------|-------------|
| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance with synthetic data for the specified precision (fp32, int8 or bf16). |
| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket with synthetic data for the specified precision (fp32, int8 or bf16). |
| `accuracy.sh` | Measures the inference accuracy (providing a `DATASET_DIR` environment variable is required) for the specified precision (fp32, int8 or bf16). |
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<!--- 0. Title -->
# PyTorch <model name> <mode>
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Model Package

The model package includes the Dockerfile and scripts needed to build and
run <model name> <mode> in a container.
```
<package dir>
├── README.md
├── build.sh
├── licenses
│   ├── LICENSE
│   └── third_party
├── model_packages
│   └── <package name>
├── <package dir>.Dockerfile
└── run.sh
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
<!--- 0. Title -->
# PyTorch ResNext 32x16d inference

<!-- 10. Description -->
## Description

This document has instructions for running ResNext 32x16d inference using
Intel-optimized PyTorch.

## Model Package

The model package includes the Dockerfile and scripts needed to build and
run ResNext 32x16d inference in a container.
```
pytorch-spr-resnext-32x16d-inference
├── README.md
├── build.sh
├── licenses
│   ├── LICENSE
│   └── third_party
├── model_packages
│   └── pytorch-spr-resnext-32x16d-inference.tar.gz
├── pytorch-spr-resnext-32x16d-inference.Dockerfile
└── run.sh
```

<!--- 40. Quick Start Scripts -->
## Quick Start Scripts

| Script name | Description |
|-------------|-------------|
| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance with synthetic data for the specified precision (fp32, int8 or bf16). |
| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket with synthetic data for the specified precision (fp32, int8 or bf16). |
| `accuracy.sh` | Measures the inference accuracy (providing a `DATASET_DIR` environment variable is required) for the specified precision (fp32, int8 or bf16). |

## Datasets

### ImageNet

The [ImageNet](http://www.image-net.org/) validation dataset is used to run the
ResNext 32x16d accuracy script. The realtime and throughput inference scripts use
synthetic data.

Download and extract the ImageNet2012 dataset from [http://www.image-net.org/](http://www.image-net.org/),
then move validation images to labeled subfolders, using
[the valprep.sh shell script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)

A after running the data prep script, your folder structure should look something like this:
```
imagenet
└── val
├── ILSVRC2012_img_val.tar
├── n01440764
│ ├── ILSVRC2012_val_00000293.JPEG
│ ├── ILSVRC2012_val_00002138.JPEG
│ ├── ILSVRC2012_val_00003014.JPEG
│ ├── ILSVRC2012_val_00006697.JPEG
│ └── ...
└── ...
```
The folder that contains the `val` directory should be set as the
`DATASET_DIR` (for example: `export DATASET_DIR=/home/<user>/imagenet`).

## Build the container

The ResNext 32x16d inference package has scripts and a Dockerfile that are
used to build a workload container that runs the model. This container
uses the PyTorch/IPEX container as it's base, so ensure that you have built
the `pytorch-ipex-spr.tar.gz` container prior to building this model container.

Use `docker images` to verify that you have the base container built. For example:
```
$ docker images | grep pytorch-ipex-spr
model-zoo pytorch-ipex-spr f5b473554295 2 hours ago 4.08GB
```

To build the ResNext 32x16d inference container, extract the package and
run the `build.sh` script.
```
# Extract the package
tar -xzf pytorch-spr-resnext-32x16d-inference.tar.gz
cd pytorch-spr-resnext-32x16d-inference
# Build the container
./build.sh
```

After the build completes, you should have a container called
`model-zoo:pytorch-spr-resnext-32x16d-inference` that will be used to run the model.

## Run the model

After you've followed the instructions to [build the container](#build-the-container)
and [prepare the dataset](#datasets), use the `run.sh` script from the container
package to run ResNext 32x16d inference in docker. Set environment variables to
specify the dataset directory (only for accuracy), precision to run, and
an output directory. By default, the `run.sh` script will run the
`inference_realtime.sh` quickstart script. To run a different script, specify
the name of the script using the `SCRIPT` environment variable.
```
# Navigate to the container package directory
cd pytorch-spr-resnext-32x16d-inference
# Set the required environment vars
export PRECISION=<specify the precision to run>
export OUTPUT_DIR=<directory where log files will be written>
# Run the container with inference_realtime.sh quickstart script
./run.sh
# To test accuracy, also specify the dataset directory
export DATASET_DIR=<path to the dataset>
SCRIPT=accuracy.sh ./run.sh
```

<!--- 80. License -->
## License

Licenses can be found in the model package, in the `licenses` directory.

Loading

0 comments on commit 5f32ec4

Please sign in to comment.