Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run single GPU tests on RAPIDS Runner #1165

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
f9fb4ce
Remove python prefix from tox environment names
oliverholworthy Jun 30, 2023
a0e2ba0
Add transformers to test requirements
oliverholworthy Jun 30, 2023
60eb203
Add gpu-cu11 tox environment
oliverholworthy Jun 30, 2023
bc8918a
Move Merlin dependencies to deps configuation of tox environment
oliverholworthy Jun 30, 2023
2914452
Use posargs for tests path in tox environments
oliverholworthy Jun 30, 2023
2c7f337
Run single GPU tests in nvidia/tensorflow and nvidia/cuda images
oliverholworthy Jun 30, 2023
487f974
Trigger GPU PR tests from push instead of pull_request
oliverholworthy Jun 30, 2023
26252c5
Add RAPIDS P100 runner to list of self-hosted runners config
oliverholworthy Jun 30, 2023
268bf20
Add fixture to cleanup dataloader
oliverholworthy Jun 30, 2023
a1515d7
Replace import of collections.Sequence with collections.abc.Sequence
oliverholworthy Jun 30, 2023
906a2c6
Merge branch 'main' into gpu-tests-rapids-runner
oliverholworthy Jul 3, 2023
aaafe21
Remove COMPARE_BRANCH from gpu.yml
oliverholworthy Jul 3, 2023
169fb2c
Update ref for branch-name action in gpu.yml
oliverholworthy Jul 3, 2023
15ae84b
Run GPU examples in RAPIDS runner
oliverholworthy Jul 3, 2023
1c514f4
Move branch env vars to one line
oliverholworthy Jul 3, 2023
7aa74c0
Add pip cache
oliverholworthy Jul 3, 2023
fbb2e40
Replace single quotes in gpu.yml
oliverholworthy Jul 3, 2023
90cff0a
Use actions/cache for tox environment
oliverholworthy Jul 3, 2023
55ff2b0
Use id for setup-python step
oliverholworthy Jul 3, 2023
63bc85a
Replace double [[ with single [
oliverholworthy Jul 3, 2023
2ec75dc
Move checkout after ubuntu package install
oliverholworthy Jul 3, 2023
7d9de2c
Use cuda runtime base image instead of devel
oliverholworthy Jul 3, 2023
8c04544
Move matrix configuration to map
oliverholworthy Jul 3, 2023
1d428a0
Update formatting of gpu.yml
oliverholworthy Jul 3, 2023
8e96553
Test against different DLFW versions
oliverholworthy Jul 3, 2023
1cd702f
disable cu11 test
oliverholworthy Jul 3, 2023
2927bb4
reformat gpu.yml
oliverholworthy Jul 3, 2023
8cd5537
Remove image env
oliverholworthy Jul 3, 2023
d963dba
Merge branch 'main' into gpu-tests-rapids-runner
marcromeyn Jul 4, 2023
1e32bc9
Merge branch 'main' into gpu-tests-rapids-runner
oliverholworthy Nov 8, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 110 additions & 35 deletions .github/workflows/gpu.yml
Original file line number Diff line number Diff line change
@@ -1,71 +1,146 @@
name: GPU CI
name: gpu-ci

on:
workflow_dispatch:
push:
branches:
- main
- "pull-request/[0-9]+"
- pull-request/*
tags:
- "v[0-9]+.[0-9]+.[0-9]+"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
gpu-ci:
runs-on: linux-amd64-gpu-p100-latest-1
strategy:
matrix:
image:
[
"nvcr.io/nvidia/tensorflow:23.02-tf2-py3",
"nvcr.io/nvidia/tensorflow:23.04-tf2-py3",
"nvcr.io/nvidia/tensorflow:23.06-tf2-py3",
]
container:
image: nvcr.io/nvstaging/merlin/merlin-ci-runner:latest
image: ${{ matrix.image }}
env:
NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
options: --shm-size=1G
credentials:
username: $oauthtoken
password: ${{ secrets.NGC_TOKEN }}

steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install and upgrade python packages
run: |
python -m pip install --upgrade pip tox
- uses: actions/cache@v3
with:
path: .tox
key: tox-${{ matrix.image }}-${{ hashFiles('requirements/*.txt') }}
- name: Get Branch name
id: get-branch-name
uses: NVIDIA-Merlin/.github/actions/branch-name@6f0539fba24f60da2aee63c5925bee7cee3206e3
- name: Run tests
run: |
nvidia-smi
pip install tox
ref_type=${{ github.ref_type }}
branch=main
if [[ $ref_type == "tag"* ]]
then
git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 origin +refs/heads/release*:refs/remotes/origin/release*
branch=$(git branch -r --contains ${{ github.ref_name }} --list '*release*' --format "%(refname:short)" | sed -e 's/^origin\///')
fi
if [[ "${{ github.ref }}" != 'refs/heads/main' ]]; then
if [ "${{ github.ref }}" != 'refs/heads/main' ]; then
extra_pytest_markers="and changed"
fi
PYTEST_MARKERS="unit and not (examples or integration or notebook) and (singlegpu or not multigpu) $extra_pytest_markers" MERLIN_BRANCH=$branch COMPARE_BRANCH=${{ github.base_ref }} tox -e gpu
merlin_branch="${{ steps.get-branch-name.outputs.branch }}"
MERLIN_BRANCH=$merlin_branch COMPARE_BRANCH=$merlin_branch \
PYTEST_MARKERS="unit and not (examples or integration or notebook) $extra_pytest_markers" \
tox -e gpu

# gpu-cu11:
# runs-on: linux-amd64-gpu-p100-latest-1
# env:
# IMAGE: "nvidia/cuda:11.8.0-runtime-ubuntu22.04"
# container:
# image: ${{ env.IMAGE }}
# env:
# NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
# strategy:
# matrix:
# versions: [{ rapids: "23.04", python: "3.8" }]
# steps:
# - name: Install Ubuntu packages
# run: |
# apt-get update -y
# apt-get install -y \
# git \
# 'libcudnn8=*cuda11.8' `# tensorflow GPU support` \
# cuda-nvcc-11-8 `# required for numba`
# - uses: actions/checkout@v3
# with:
# fetch-depth: 0
# - name: Set up Python ${{ matrix.version.python }}
# id: setup-python
# uses: actions/setup-python@v4
# with:
# python-version: ${{ matrix.version.python }}
# - uses: actions/cache@v3
# with:
# path: .tox
# key: tox-${{ matrix.IMAGE }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements/*.txt') }}
# - name: Install and upgrade python packages
# run: |
# python -m pip install --upgrade pip tox
# - name: Get Branch name
# id: get-branch-name
# uses: NVIDIA-Merlin/.github/actions/branch-name@6f0539fba24f60da2aee63c5925bee7cee3206e3
# - name: Run tests
# run: |
# if [ "${{ github.ref }}" != 'refs/heads/main' ]; then
# extra_pytest_markers="and changed"
# fi
# merlin_branch="${{ steps.get-branch-name.outputs.branch }}"
# RAPIDS_VERSION=${{ matrix.version.rapids }} MERLIN_BRANCH=$merlin_branch COMPARE_BRANCH=$merlin_branch \
# PYTEST_MARKERS="unit and not (examples or integration or notebook) $extra_pytest_markers" \
# tox -e gpu-cu11

gpu-ci-examples:
tests-examples:
runs-on: linux-amd64-gpu-p100-latest-1
container:
image: nvcr.io/nvstaging/merlin/merlin-ci-runner:latest
image: "nvidia/cuda:11.8.0-runtime-ubuntu22.04"
env:
NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
options: --shm-size=1G
credentials:
username: $oauthtoken
password: ${{ secrets.NGC_TOKEN }}
strategy:
matrix:
version: [{ rapids: "23.04", python: "3.8" }]
steps:
- name: Install Ubuntu packages
run: |
apt-get update -y
# libcudnn8 installed for tensorflow GPU support
apt-get install -y \
git \
'libcudnn8=*cuda11.8' `# tensorflow GPU support` \
cuda-nvcc-11-8 `# required for numba`
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python ${{ matrix.version.python }}
id: setup-python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.version.python }}
- uses: actions/cache@v3
with:
path: .tox
key: tox-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements/*.txt') }}
- name: Install and upgrade python packages
run: |
python -m pip install --upgrade pip tox
- name: Get Branch name
id: get-branch-name
uses: NVIDIA-Merlin/.github/actions/branch-name@6f0539fba24f60da2aee63c5925bee7cee3206e3
- name: Run tests
run: |
pip install tox
ref_type=${{ github.ref_type }}
branch=main
if [[ $ref_type == "tag"* ]]
then
git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 origin +refs/heads/release*:refs/remotes/origin/release*
branch=$(git branch -r --contains ${{ github.ref_name }} --list '*release*' --format "%(refname:short)" | sed -e 's/^origin\///')
fi
if [[ "${{ github.ref }}" != 'refs/heads/main' ]]; then
if [ "${{ github.ref }}" != 'refs/heads/main' ]; then
extra_pytest_markers="and changed"
fi
PYTEST_MARKERS="(examples or notebook) $extra_pytest_markers" MERLIN_BRANCH=$branch COMPARE_BRANCH=${{ github.base_ref }} tox -e gpu
merlin_branch="${{ steps.get-branch-name.outputs.branch }}"
RAPIDS_VERSION=${{ matrix.version.rapids }} MERLIN_BRANCH=$merlin_branch COMPARE_BRANCH=$merlin_branch \
PYTEST_MARKERS="(examples or notebook) $extra_pytest_markers" \
tox -e gpu-cu11
1 change: 1 addition & 0 deletions requirements/test.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
-r dev.txt
-r pytorch.txt
-r tensorflow.txt
-r transformers.txt

numpy<1.24
28 changes: 27 additions & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,42 @@
; .github/workflows/cpu-ci.yml for the workflow definition.

[tox]
envlist = gpu,multi-gpu,horovod-cpu,nvtabular-cpu,systems-cpu,transformers4rec-cpu,docs,docs-multi
envlist = gpu,gpu-cu11,multi-gpu,horovod-cpu,nvtabular-cpu,systems-cpu,transformers4rec-cpu,docs,docs-multi

[testenv]
commands =
pip install --upgrade pip
pip install -e .[all]

[testenv:gpu-cu11]
; Runs in: GitHub Actions
; Runs GPU-based tests.
setenv =
TF_GPU_ALLOCATOR=cuda_malloc_async
PIP_EXTRA_INDEX_URL=https://pypi.nvidia.com
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
allowlist_externals =
bash
passenv =
CUDA_VISIBLE_DEVICES
deps =
-rrequirements/test.txt
git+https://github.com/NVIDIA-Merlin/core.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/dataloader.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/NVTabular.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/systems.git@{env:MERLIN_BRANCH}
nvidia-cudnn-cu11~=8.6.0
cudf-cu11=={env:RAPIDS_VERSION}
dask-cudf-cu11=={env:RAPIDS_VERSION}
commands =
bash -c 'python -m pytest --cov-report term --cov merlin -m "{env:PYTEST_MARKERS}" -rxs {posargs:tests} || ([ $? = 5 ] && exit 0 || exit $?)'

[testenv:gpu]
; Runs in: Github Actions
; Runs GPU-based tests.
allowlist_externals =
bash
cp
deps =
-rrequirements/test.txt
git+https://github.com/NVIDIA-Merlin/core.git@{env:MERLIN_BRANCH}
Expand All @@ -26,6 +50,8 @@ setenv =
TF_GPU_ALLOCATOR=cuda_malloc_async
sitepackages=true
commands =
; copy system libs into virtualenv path (e.g. XGBoost)
bash -c 'cp $(python -c "import sys; print(sys.base_prefix)")/lib/*.so* $(python -c "import sys; print(sys.prefix)")/lib'
bash -c 'python -m pytest --cov-report term --cov merlin -m "{env:PYTEST_MARKERS}" -rxs {posargs:tests} || ([ $? = 5 ] && exit 0 || exit $?)'

[testenv:horovod-gpu]
Expand Down
Loading