Skip to content

Commit

Permalink
fix formatting, anchors
Browse files Browse the repository at this point in the history
Signed-off-by: Peter Jun Park <[email protected]>
  • Loading branch information
peterjunpark committed Jun 28, 2024
1 parent 15d9bb9 commit 8faf342
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 14 deletions.
7 changes: 6 additions & 1 deletion docs/conceptual/glossary.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.. meta::
:description: Omniperf documentation and reference
:keywords: Omniperf, ROCm, glossary, definitions, terms, profiler, tool, Instinct, accelerator, AMD
:keywords: Omniperf, ROCm, glossary, definitions, terms, profiler, tool,
Instinct, accelerator, AMD

********
Glossary
Expand Down Expand Up @@ -132,6 +133,8 @@ and in this documentation.

.. include:: ./includes/normalization-units.rst

.. _memory-spaces:

Memory spaces
=============

Expand Down Expand Up @@ -203,6 +206,8 @@ of LLVM:
will always have the most up-to-date information, and the interested reader is
referred to this source for a more complete explanation.

.. _memory-type:

Memory type
===========

Expand Down
2 changes: 1 addition & 1 deletion docs/conceptual/performance-model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ use Omniperf to optimize your code.
References
==========

Some sections in the materials in the sections might refer the following
Some sections in the following materials might refer the following
publicly available documentation.

* :hip-training-pdf:`Introduction to AMD GPU Programming with HIP <>`
Expand Down
26 changes: 14 additions & 12 deletions docs/tutorial/includes/infinity-fabric-transactions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,20 +41,21 @@ is identically false (and thus: we expect no writes).
different operation types (such as atomics, writes). This abbreviated version
is presented here for reference only.

Finally, this sample code lets the user control: - The `granularity of
an allocation <Mtype>`__, - The owner of an allocation (local HBM, CPU
Finally, this sample code lets the user control: - The :ref:`granularity of
an allocation <memory-type>`, - The owner of an allocation (local HBM, CPU
DRAM or remote HBM), and - The size of an allocation (the default is
:math:`\sim4`\ GiB)

via command line arguments. In doing so, we can explore the impact of
these parameters on the L2-Fabric metrics reported by Omniperf to
further understand their meaning.

All results in this section were generated an a node of Infinity
Fabric(tm) connected MI250 accelerators using ROCm v5.6.0, and Omniperf
v2.0.0. Although results may vary with ROCm versions and accelerator
connectivity, we expect the lessons learned here to be broadly
applicable.
.. note::

All results in this section were generated an a node of Infinity
Fabric connected MI250 accelerators using ROCm version 5.6.0, and Omniperf
version 2.0.0. Although results may vary with ROCm versions and accelerator
connectivity, we expect the lessons learned here to be broadly applicable.

.. _infinity-fabric-ex1:

Expand Down Expand Up @@ -201,7 +202,7 @@ accelerator. Our code uses the ``hipExtMallocWithFlag`` API with the
│ 17.5.4 │ Remote Read │ 6.00 │ 6.00 │ 6.00 │ Req per kernel │
╘═════════╧═════════════════╧══════════════╧══════════════╧══════════════╧════════════════╛
Comparing with our `previous example <Fabric_exp_1>`__, we see a
Comparing with our :ref:`previous example <infinity-fabric-ex1>`, we see a
relatively similar result, namely: - The vast majority of L2-Fabric
requests are 64B read requests (17.5.2) - Nearly all these read requests
are directed to the accelerator-local HBM (17.2.1)
Expand All @@ -212,17 +213,18 @@ Fabric(tm).

.. code:: {note}

The stalls in Sec 17.4 are presented as a percentage of the total number active L2 cycles, summed over [all L2 channels](L2).
The stalls in Sec 17.4 are presented as a percentage of the total number
active L2 cycles, summed over [all L2 channels](L2).

.. _infinity-fabric-ex3:

Experiment #3 - Fine-grained, remote-accelerator HBM reads
----------------------------------------------------------

In this experiment, we move our `fine-grained <Mtype>`__ allocation to
In this experiment, we move our :ref:`fine-grained <memory-type>` allocation to
be owned by a remote accelerator. We accomplish this by first changing
the HIP device using e.g., ``hipSetDevice(1)`` API, then allocating
fine-grained memory (as described `previously <Fabric_exp_2>`__), and
fine-grained memory (as described :ref:`previously <infinity-fabric-ex2>`), and
finally resetting the device back to the default, e.g.,
``hipSetDevice(0)``.

Expand Down Expand Up @@ -308,7 +310,7 @@ addition, because these are crossing between accelerators, we expect
significantly lower achievable bandwidths as compared to the local
accelerator’s HBM – this is reflected (indirectly) in the magnitude of
the stall metric (17.4.1). Finally, we note that if our system contained
only PCIe(r) connected accelerators, these observations will differ.
only PCIe connected accelerators, these observations will differ.

.. _infinity-fabric-ex4:

Expand Down

0 comments on commit 8faf342

Please sign in to comment.