Skip to content

Commit

Permalink
Release - SuperBench v0.10.0 (#607)
Browse files Browse the repository at this point in the history
**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606

Co-authored-by: Ziyue Yang <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yuting Jiang <[email protected]>
Co-authored-by: guoshzhao <[email protected]>
  • Loading branch information
5 people authored Jan 8, 2024
1 parent 2c2096e commit 2c88db9
Show file tree
Hide file tree
Showing 56 changed files with 919 additions and 240 deletions.
26 changes: 19 additions & 7 deletions .github/workflows/build-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
docker:
name: Docker build ${{ matrix.name }}
runs-on: ${{ matrix.runner }}
timeout-minutes: 600
permissions:
contents: read
packages: write
Expand All @@ -27,15 +28,23 @@ jobs:
- name: cuda12.2
dockerfile: cuda12.2
tags: superbench/main:cuda12.2
runner: ubuntu-latest
runner: [self-hosted, rocm-build]
build_args: "NUM_MAKE_JOBS=64"
- name: cuda11.1.1
dockerfile: cuda11.1.1
tags: superbench/main:cuda11.1.1,superbench/superbench:latest
runner: ubuntu-latest
build_args: "NUM_MAKE_JOBS=8"
- name: rocm5.7
dockerfile: rocm5.7.x
tags: superbench/main:rocm5.7
runner: [self-hosted, rocm-build]
build_args: "NUM_MAKE_JOBS=64"
- name: rocm6.0
dockerfile: rocm6.0.x
tags: superbench/main:rocm6.0
runner: [self-hosted, rocm-build]
build_args: "NUM_MAKE_JOBS=64"
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down Expand Up @@ -75,7 +84,7 @@ jobs:
fi
DOCKERFILE=dockerfile/${{ matrix.dockerfile }}.dockerfile
BUILD_ARGS="NUM_MAKE_JOBS=8"
BUILD_ARGS=${{ matrix.build_args }}
if [[ "${{ matrix.extra_args }}" ]]; then
BUILD_ARGS="${BUILD_ARGS} ${{ matrix.extra_args }}"
fi
Expand All @@ -87,11 +96,11 @@ jobs:
CACHE_TO="type=inline,mode=max"
fi
echo ::set-output name=dockerfile::${DOCKERFILE}
echo ::set-output name=build_args::${BUILD_ARGS}
echo ::set-output name=tags::${TAGS}
echo ::set-output name=cache_from::${CACHE_FROM}
echo ::set-output name=cache_to::${CACHE_TO}
echo "dockerfile=${DOCKERFILE}" >> "$GITHUB_OUTPUT"
echo "build_args=${BUILD_ARGS}" >> "$GITHUB_OUTPUT"
echo "tags=${TAGS}" >> "$GITHUB_OUTPUT"
echo "cache_from=${CACHE_FROM}" >> "$GITHUB_OUTPUT"
echo "cache_to=${CACHE_TO}" >> "$GITHUB_OUTPUT"
- name: Echo build args
run: echo ${{ steps.metadata.outputs.build_args }}
- name: Echo image tag
Expand All @@ -106,6 +115,9 @@ jobs:
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Pull cache image
run: sudo docker pull ${{ steps.metadata.outputs.tags }}
continue-on-error: true
- name: Login to the GitHub Container Registry
uses: docker/login-action@v1
if: ${{ github.event_name == 'release' }}
Expand Down
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,9 @@
[submodule "third_party/msccl"]
path = third_party/msccl
url = https://github.com/Azure/msccl
[submodule "third_party/Megatron/Megatron-LM"]
path = third_party/Megatron/Megatron-LM
url = https://github.com/NVIDIA/Megatron-LM.git
[submodule "third_party/Megatron/Megatron-DeepSpeed"]
path = third_party/Megatron/Megatron-DeepSpeed
url = https://github.com/microsoft/Megatron-DeepSpeed.git
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

__SuperBench__ is a validation and profiling tool for AI infrastructure.

📢 [v0.9.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.9.0) has been released!
📢 [v0.10.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.10.0) has been released!

## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._

Expand Down
9 changes: 8 additions & 1 deletion dockerfile/cuda12.2.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ FROM nvcr.io/nvidia/pytorch:23.10-py3
# NVIDIA:
# - CUDA: 12.2.2
# - cuDNN: 8.9.5
# - NCCL: v2.19.3-1
# - NCCL: v2.18.3-1
# Mellanox:
# - OFED: 23.07-0.5.1.2
# - HPC-X: v2.16
Expand Down Expand Up @@ -113,6 +113,13 @@ RUN cd /tmp && \
mv amd-blis /opt/AMD && \
rm -rf aocl-blis-linux-aocc-4.0.tar.gz

# Install NCCL 2.18.3
RUN cd /tmp && \
git clone -b v2.18.3-1 https://github.com/NVIDIA/nccl.git && \
cd nccl && \
make -j src.build && \
make install && \
rm -rf /tmp/nccl

ENV PATH="${PATH}" \
LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" \
Expand Down
2 changes: 2 additions & 0 deletions dockerfile/directx12.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ RUN curl -s -L https://dist.nuget.org/win-x86-commandline/latest/nuget.exe -o "%
# Run the setup script to install the visual studio components
RUN "%SB_HOME%\\dockerfile\\directx\\install-components.bat"

RUN powershell -Command "Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem -Name LongPathsEnabled -Value 1;"
RUN git config --system core.longpaths true
# Install Superbench
RUN python -m pip install setuptools==65.0.0 && \
python -m pip install --no-cache-dir .[amdworker] && \
Expand Down
32 changes: 16 additions & 16 deletions dockerfile/etc/ndv4-topo.xml
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
<system version="1">
<cpu numaid="0" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:01.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="0001:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0101:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="0002:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0102:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
<cpu numaid="1" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:02.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="0003:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0103:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="0004:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0104:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
<cpu numaid="2" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:03.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="000b:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0105:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="000c:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0106:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<cpu numaid="1" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:02.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="0001:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0101:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="0002:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0102:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
<cpu numaid="3" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:04.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<cpu numaid="2" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:03.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="000d:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0107:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="000e:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0108:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
<cpu numaid="3" affinity="0000ffff,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="23" modelid="49">
<pci busid="ffff:ff:04.0" class="0x060400" link_speed="16 GT/s" link_width="16">
<pci busid="000b:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0105:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
<pci busid="000c:00:00.0" class="0x030200" link_speed="16 GT/s" link_width="16"/>
<pci busid="0106:00:00.0" class="0x020700" link_speed="16 GT/s" link_width="16"/>
</pci>
</cpu>
</system>
38 changes: 38 additions & 0 deletions dockerfile/etc/ndv5-topo.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<system version="1">
<cpu numaid="0" affinity="ffffffff,ffff0000,00000000" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143">
<pci busid="ffff:ff:01.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0001:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0101:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:02.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0002:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0102:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:03.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0003:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0103:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:04.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0008:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0104:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
</cpu>
<cpu numaid="1" affinity="00000000,0000ffff,ffffffff" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143">
<pci busid="ffff:ff:05.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="0009:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0105:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:06.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="000a:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0106:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:07.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="000b:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0107:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
<pci busid="ffff:ff:08.0" class="0x060400" link_speed="32.0 GT/s PCIe" link_width="16" vendor="0x0000" device="0x0000" subsystem_vendor="0x0000" subsystem_device="0x0000">
<pci busid="000c:00:00.0" class="0x030200" link_speed="32.0 GT/s PCIe" link_width="16"/>
<pci busid="0108:00:00.0" class="0x020700" link_speed="32.0 GT/s PCIe" link_width="16"/>
</pci>
</cpu>
</system>
65 changes: 33 additions & 32 deletions dockerfile/rocm5.7.x.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ RUN apt-get update && \
apt-get -q install -y --no-install-recommends \
autoconf \
automake \
bc \
build-essential \
curl \
dmidecode \
Expand All @@ -27,6 +28,7 @@ RUN apt-get update && \
libaio-dev \
libboost-program-options-dev \
libcap2 \
libcurl4-openssl-dev \
libnuma-dev \
libpci-dev \
libssl-dev \
Expand All @@ -38,6 +40,7 @@ RUN apt-get update && \
openssh-client \
openssh-server \
pciutils \
python3-mpi4py \
rsync \
sudo \
util-linux \
Expand All @@ -46,11 +49,11 @@ RUN apt-get update && \
&& \
rm -rf /tmp/*

ARG NUM_MAKE_JOBS=16
ARG NUM_MAKE_JOBS=

# Check if CMake is installed and its version
RUN cmake_version=$(cmake --version 2>/dev/null | grep -oP "(?<=cmake version )(\d+\.\d+)" || echo "0.0") && \
required_version="3.26.4" && \
required_version="3.24.1" && \
if [ "$(printf "%s\n" "$required_version" "$cmake_version" | sort -V | head -n 1)" != "$required_version" ]; then \
echo "existing cmake version is ${cmake_version}" && \
cd /tmp && \
Expand Down Expand Up @@ -100,40 +103,26 @@ RUN if ! command -v ofed_info >/dev/null 2>&1; then \
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}* ; \
fi

# Install UCX
ENV UCX_VERSION=1.14.1
RUN if [ -z "$(ls -A /opt/ucx)" ]; then \
echo "/opt/ucx is empty. Installing UCX..."; \
cd /tmp && \
git clone https://github.com/openucx/ucx.git -b v${UCX_VERSION} && \
cd ucx && \
./autogen.sh && \
mkdir build && \
cd build && \
../configure -prefix=$UCX_DIR --with-rocm=/opt/rocm --without-knem && \
make -j $(nproc) && make -j $(nproc) install && rm -rf /tmp/ucx-${UCX_VERSION} ; \
else \
echo "/opt/ucx is not empty. Skipping UCX installation."; \
fi
# Add target file to help determine which device(s) to build for
ENV ROCM_PATH=/opt/rocm
RUN bash -c 'echo -e "gfx90a:xnack-\ngfx90a:xnac+\ngfx940\ngfx941\ngfx942\ngfx1030\ngfx1100\ngfx1101\ngfx1102\n" >> ${ROCM_PATH}/bin/target.lst'

# Install OpenMPI
ENV OPENMPI_VERSION=4.1.x
ENV MPI_HOME=/usr/local/mpi
# Check if Open MPI is installed
RUN [ -d /usr/local/bin/mpirun ] || { \
echo "Open MPI not found. Installing Open MPI..." && \
cd /tmp && \
RUN cd /tmp && \
git clone --recursive https://github.com/open-mpi/ompi.git -b v${OPENMPI_VERSION} && \
cd ompi && \
./autogen.pl && \
mkdir build && \
cd build && \
../configure --prefix=/usr/local --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --enable-prte-prefix-by-default --enable-mca-no-build=btl-uct --with-ucx=/opt/ucx --with-rocm=/opt/rocm && \
../configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default --enable-prte-prefix-by-default --with-rocm=/opt/rocm && \
make -j $(nproc) && \
make -j $(nproc) install && \
ldconfig && \
cd / && \
rm -rf /tmp/openmpi-${OPENMPI_VERSION}* ;\
}
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*

# Install Intel MLC
RUN cd /tmp && \
Expand All @@ -148,12 +137,18 @@ RUN cd /opt/ && \
cd rccl && \
mkdir build && \
cd build && \
CXX=/opt/rocm/bin/hipcc cmake -DCMAKE_PREFIX_PATH=/opt/rocm/ .. && \
CXX=/opt/rocm/bin/hipcc cmake -DHIP_COMPILER=clang -DCMAKE_BUILD_TYPE=Release -DCMAKE_VERBOSE_MAKEFILE=1 \
-DCMAKE_PREFIX_PATH="${ROCM_PATH}/hsa;${ROCM_PATH}/hip;${ROCM_PATH}/share/rocm/cmake/;${ROCM_PATH}" \
.. && \
make -j${NUM_MAKE_JOBS}

ENV PATH="/opt/superbench/bin:/usr/local/bin/:/opt/rocm/hip/bin/:/opt/rocm/bin/:${PATH}" \
# Install AMD SMI Python Library
RUN cd /opt/rocm/share/amd_smi && \
python3 -m pip install --user .

ENV PATH="/usr/local/mpi/bin:/opt/superbench/bin:/usr/local/bin/:/opt/rocm/hip/bin/:/opt/rocm/bin/:${PATH}" \
LD_PRELOAD="/opt/rccl/build/librccl.so:$LD_PRELOAD" \
LD_LIBRARY_PATH="/opt/ucx/lib:/usr/local/lib/:/opt/rocm/lib:${LD_LIBRARY_PATH}" \
LD_LIBRARY_PATH="/usr/local/mpi/lib:/usr/lib/x86_64-linux-gnu/:/usr/local/lib/:/opt/rocm/lib:${LD_LIBRARY_PATH}" \
SB_HOME=/opt/superbench \
SB_MICRO_PATH=/opt/superbench \
ANSIBLE_DEPRECATION_WARNINGS=FALSE \
Expand All @@ -163,13 +158,19 @@ RUN echo PATH="$PATH" > /etc/environment && \
echo LD_LIBRARY_PATH="$LD_LIBRARY_PATH" >> /etc/environment && \
echo SB_MICRO_PATH="$SB_MICRO_PATH" >> /etc/environment

RUN apt install rocm-cmake -y && \
python3 -m pip install --upgrade pip wheel setuptools==65.7

WORKDIR ${SB_HOME}

ADD third_party third_party
# Apply patch
RUN cd third_party/perftest && \
git apply ../perftest_rocm6.patch
RUN make RCCL_HOME=/opt/rccl/build/ ROCBLAS_BRANCH=release/rocm-rel-5.7.1.1 HIPBLASLT_BRANCH=release/rocm-rel-5.7 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm

ADD . .
RUN apt install rocm-cmake -y && \
python3 -m pip install --upgrade pip wheel setuptools==65.7 && \
python3 -m pip install .[amdworker] && \
#ENV USE_HIPBLASLT_DATATYPE=1
RUN python3 -m pip install .[amdworker] && \
CXX=/opt/rocm/bin/hipcc make cppbuild && \
make postinstall
RUN make cppbuild
ADD third_party third_party
RUN make RCCL_HOME=/opt/rccl/build/ ROCBLAS_BRANCH=release/rocm-rel-5.7.1.1 HIPBLASLT_BRANCH=release-staging/rocm-rel-5.7 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm
Loading

0 comments on commit 2c88db9

Please sign in to comment.