From 915f34213d59058d60d672dde2a98ecad0ce9469 Mon Sep 17 00:00:00 2001 From: Peter Jun Park Date: Thu, 6 Jun 2024 09:23:10 -0400 Subject: [PATCH] Consolidate last 2 tabs --- docs/tutorial/saxpy.rst | 113 +++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 59 deletions(-) diff --git a/docs/tutorial/saxpy.rst b/docs/tutorial/saxpy.rst index f91638876a..3dc613eca9 100644 --- a/docs/tutorial/saxpy.rst +++ b/docs/tutorial/saxpy.rst @@ -54,7 +54,7 @@ speaking, you can compute this using a single ``for`` loop over three arrays. z[i] = a * x[i] + y[i]; In linear algebra libraries, such as BLAS (Basic Linear Algebra Subsystem) this -operation is defined as AXPY "A times X Plus Y". The term SAXPY refers to the +operation is defined as AXPY "A times X Plus Y". The term SAXPY refers to the single-precision version of this operation The "S" comes from @@ -483,7 +483,7 @@ find out what device binary flavors are embedded into the executable? that a compute capability 5.2 ISA got embedded into the executable, so devices which sport compute capability 5.2 or newer will be able to run this code. - .. tab-item:: Windows & AMD + .. tab-item:: Windows and AMD :sync: windows-amd The HIP SDK for Windows don't yet sport the ``roc-*`` set of utilities to work @@ -630,6 +630,21 @@ format our available devices use. Name: gfx906 Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- + Now that you know which graphics IPs our devices use, recompile your program with + the appropriate parameters. + + .. code-block:: bash + + amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --offload-arch=gfx906:sramecc+:xnack- + + Now the sample will run. + + .. code-block:: + + ./saxpy + Calculating y[i] = a * x[i] + y[i] over 1000000 elements. + First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] + .. tab-item:: Linux and NVIDIA :sync: linux-nvidia @@ -662,6 +677,26 @@ format our available devices use. executable but is used by ``nvcc`` to determine what devices are in the system at hand. + Now that you know which graphics IPs our devices use, recompile your program with + the appropriate parameters. + + .. code-block:: bash + + nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu -arch=sm_70,sm_86 + + .. note:: + + If you want to portably target the development machine which is compiling, you + may specify ``-arch=native`` instead. + + Now the sample will run. + + .. code-block:: + + ./saxpy + Calculating y[i] = a * x[i] + y[i] over 1000000 elements. + First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] + .. tab-item:: Windows and AMD :sync: windows-amd @@ -676,6 +711,21 @@ format our available devices use. gcnArchName: gfx1032 gcnArchName: gfx1035 + Now that you know which graphics IPs our devices use, recompile your program with + the appropriate parameters. + + .. code-block:: powershell + + clang++ .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I .\Common -lamdhip64 -L ${env:HIP_PATH}lib -O2 --offload-arch=gfx1032 --offload-arch=gfx1035 + + Now the sample will run. + + .. code-block:: + + .\saxpy.exe + Calculating y[i] = a * x[i] + y[i] over 1000000 elements. + First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] + .. tab-item:: Windows and NVIDIA :sync: windows-nvidia @@ -709,63 +759,8 @@ format our available devices use. facing executable but is used by ``nvcc`` to determine what devices are in the system at hand. -Now that you know which graphics IPs our devices use, recompile your program with -the appropriate parameters. - -.. tab-set:: - - .. tab-item:: Linux and AMD - :sync: linux-amd - - .. code-block:: bash - - amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --offload-arch=gfx906:sramecc+:xnack- - - Now the sample will run. - - .. code-block:: - - ./saxpy - Calculating y[i] = a * x[i] + y[i] over 1000000 elements. - First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] - - .. tab-item:: Linux and NVIDIA - :sync: linux-nvidia - - .. code-block:: bash - - nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu -arch=sm_70,sm_86 - - .. note:: - - If you want to portably target the development machine which is compiling, you - may specify ``-arch=native`` instead. - - Now the sample will run. - - .. code-block:: - - ./saxpy - Calculating y[i] = a * x[i] + y[i] over 1000000 elements. - First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] - - .. tab-item:: Windows and AMD - :sync: windows-amd - - .. code-block:: powershell - - clang++ .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I .\Common -lamdhip64 -L ${env:HIP_PATH}lib -O2 --offload-arch=gfx1032 --offload-arch=gfx1035 - - Now the sample will run. - - .. code-block:: - - .\saxpy.exe - Calculating y[i] = a * x[i] + y[i] over 1000000 elements. - First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] - - .. tab-item:: Windows and NVIDIA - :sync: windows-nvidia + Now that you know which graphics IPs our devices use, recompile your program with + the appropriate parameters. .. code-block:: powershell