Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD] Upgrade AMD CI docker image #5230

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 5 additions & 12 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ jobs:
runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-HIP)}}
name: Integration-Tests (${{matrix.runner[1] == 'gfx90a' && 'mi210' || 'mi300x'}})
container:
image: rocm/pytorch:rocm6.1_ubuntu22.04_py3.10_pytorch_2.4
image: rocmshared/pytorch:rocm6.2.2_ubuntu22.04_py3.10_pytorch_2.5.1_asan
options: --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --user root
steps:
- name: Checkout
Expand Down Expand Up @@ -396,22 +396,15 @@ jobs:

mkdir -p ~/.ccache
du -h -d 1 ~/.ccache
- name: Update PATH
run: |
echo "/opt/rocm/llvm/bin" >> $GITHUB_PATH
- name: Install pip dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install lit
- name: Install apt dependencies
- name: Update compiler to clang
run: |
apt update
apt install ccache
export CC=/usr/bin/clang
export CXX=/usr/bin/clang++
- name: Install Triton
id: amd-install-triton
run: |
echo "PATH is '$PATH'"
pip uninstall -y triton
pip uninstall -y triton pytorch-triton-rocm
cd python
ccache --zero-stats
pip install -v -e '.[tests]'
Expand Down
19 changes: 5 additions & 14 deletions .github/workflows/integration-tests.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ jobs:
name: Integration-Tests (${{matrix.runner[1] == 'gfx90a' && 'mi210' || 'mi300x'}})

container:
image: rocm/pytorch:rocm6.1_ubuntu22.04_py3.10_pytorch_2.4
image: rocmshared/pytorch:rocm6.2.2_ubuntu22.04_py3.10_pytorch_2.5.1_asan
options: --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --user root

steps:
Expand All @@ -388,25 +388,16 @@ jobs:
- *restore-build-artifacts-step
- *inspect-cache-directories-step

- name: Update PATH
run: |
echo "/opt/rocm/llvm/bin" >> $GITHUB_PATH

- name: Install pip dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install lit

- name: Install apt dependencies
- name: Update compiler to clang
run: |
apt update
apt install ccache
export CC=/usr/bin/clang
export CXX=/usr/bin/clang++

- name: Install Triton
id: amd-install-triton
run: |
echo "PATH is '$PATH'"
pip uninstall -y triton
pip uninstall -y triton pytorch-triton-rocm
cd python
ccache --zero-stats
pip install -v -e '.[tests]'
Expand Down
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is changes in this file necessary? Can we separate it out to another pull request?

Copy link
Contributor Author

@AlexAUT AlexAUT Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can separate it into another PR but it needs to be merged first. Without the proton changes the tests will fail when upgrading to rocm 6.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rebase this branch once #5252 has landed. The commit/changes will then disappear.

Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ std::shared_ptr<Metric>
convertActivityToMetric(const roctracer_record_t *activity) {
std::shared_ptr<Metric> metric;
switch (activity->kind) {
case kHipVdiCommandTask:
case kHipVdiCommandKernel: {
if (activity->begin_ns < activity->end_ns) {
metric = std::make_shared<KernelMetric>(
Expand Down Expand Up @@ -135,7 +136,7 @@ void processActivity(RoctracerProfiler::CorrIdToExternIdMap &corrIdToExternId,
const roctracer_record_t *record, bool isAPI,
bool isGraph) {
switch (record->kind) {
case 0x11F1: // Task - kernel enqueued by graph launch
case kHipVdiCommandTask:
case kHipVdiCommandKernel: {
processActivityKernel(corrIdToExternId, externId, dataSet, record, isAPI,
isGraph);
Expand Down Expand Up @@ -169,6 +170,7 @@ std::pair<bool, bool> matchKernelCbId(uint32_t cbId) {
case HIP_API_ID_hipModuleLaunchCooperativeKernel:
case HIP_API_ID_hipModuleLaunchCooperativeKernelMultiDevice:
case HIP_API_ID_hipGraphExecDestroy:
case HIP_API_ID_hipGraphInstantiateWithFlags:
case HIP_API_ID_hipGraphInstantiate: {
isRuntimeApi = true;
break;
Expand Down Expand Up @@ -300,6 +302,13 @@ void RoctracerProfiler::RoctracerProfilerPimpl::apiCallback(
pImpl->StreamToCaptureCount[Stream]++;
break;
}
case HIP_API_ID_hipGraphInstantiateWithFlags: {
hipGraph_t Graph = data->args.hipGraphInstantiateWithFlags.graph;
hipGraphExec_t GraphExec =
*(data->args.hipGraphInstantiateWithFlags.pGraphExec);
pImpl->GraphExecToGraph[GraphExec] = Graph;
break;
}
case HIP_API_ID_hipGraphInstantiate: {
hipGraph_t Graph = data->args.hipGraphInstantiate.graph;
hipGraphExec_t GraphExec = *(data->args.hipGraphInstantiate.pGraphExec);
Expand Down
Loading