Skip to content

Commit

Permalink
fix words and formatting
Browse files Browse the repository at this point in the history
Signed-off-by: Peter Jun Park <[email protected]>

formatting

Signed-off-by: Peter Jun Park <[email protected]>

same

reorg
  • Loading branch information
peterjunpark committed Jul 15, 2024
1 parent 184f18a commit 3020cca
Show file tree
Hide file tree
Showing 61 changed files with 4,867 additions and 2,637 deletions.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
* @koomie @coleramos425

# Documentation files
docs/* @ROCm/rocm-documentation
docs/ @ROCm/rocm-documentation
*.md @ROCm/rocm-documentation
*.rst @ROCm/rocm-documentation
.readthedocs.yaml @ROCm/rocm-documentation
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ VERSION.sha
# documentation artifacts
/_build
_toc.yml

137 changes: 95 additions & 42 deletions docs/concept/command-processor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,95 +2,148 @@
Command processor (CP)
**********************

The command processor (CP) is responsible for interacting with the AMDGPU Kernel
Driver (a.k.a., the Linux Kernel) on the CPU and
for interacting with user-space HSA clients when they submit commands to
HSA queues. Basic tasks of the CP include reading commands (e.g.,
corresponding to a kernel launch) out of `HSA
Queues <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-Runtime-1.2.pdf>`__
(Sec. 2.5), scheduling work to subsequent parts of the scheduler
pipeline, and marking kernels complete for synchronization events on the
host.

The command processor is composed of two sub-components:

- Fetcher (CPF): Fetches commands out of memory to hand them over to
the CPC for processing
- Packet Processor (CPC): The micro-controller running the command
processing firmware that decodes the fetched commands, and (for
kernels) passes them to the `Workgroup Processors <SPI>`__ for
scheduling

Before scheduling work to the accelerator, the command-processor can
first acquire a memory fence to ensure system consistency `(Sec
2.6.4) <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-Runtime-1.2.pdf>`__.
After the work is complete, the command-processor can apply a
memory-release fence. Depending on the AMD CDNA accelerator under
question, either of these operations *may* initiate a cache write-back
or invalidation.
The command processor (CP) is responsible for interacting with the AMDGPU kernel
driver -- the Linux kernel -- on the CPU and for interacting with user-space
HSA clients when they submit commands to HSA queues. Basic tasks of the CP
include reading commands (such as, corresponding to a kernel launch) out of
:hsa-runtime-pdf:`HSA queues <68>`, scheduling work to subsequent parts of the
scheduler pipeline, and marking kernels complete for synchronization events on
the host.

The command processor consists of two sub-components:

* :ref:`Fetcher <cpf-metrics>` (CPF): Fetches commands out of memory to hand
them over to the CPC for processing.

* :ref:`Packet processor <cpc-metrics>` (CPC): Micro-controller running the
command processing firmware that decodes the fetched commands and (for
kernels) passes them to the :ref:`workgroup processors <desc-spi>` for
scheduling.

Before scheduling work to the accelerator, the command processor can
first acquire a memory fence to ensure system consistency
:hsa-runtime-pdf:`Section 2.6.4 <91>`. After the work is complete, the
command processor can apply a memory-release fence. Depending on the AMD CDNA
accelerator under question, either of these operations *might* initiate a cache
write-back or invalidation.

Analyzing command processor performance is most interesting for kernels
that the user suspects to be scheduling/launch-rate limited. The command
processor’s metrics therefore are focused on reporting, e.g.:
that you suspect to be limited by scheduling or launch rate. The command
processor’s metrics therefore are focused on reporting, for example:

* Utilization of the fetcher

* Utilization of the packet processor, and decoding processing packets

* Stalls in fetching and processing

- Utilization of the fetcher
- Utilization of the packet processor, and decoding processing packets
- Fetch/processing stalls
.. _cpf-metrics:

Command Processor Fetcher (CPF) metrics
=======================================

.. list-table::
:header-rows: 1
:widths: 20 65 15

* - Metric

- Description

- Unit

* - CPF Utilization
- Percent of total cycles where the CPF was busy actively doing any work. The ratio of CPF busy cycles over total cycles counted by the CPF.

- Percent of total cycles where the CPF was busy actively doing any work.
The ratio of CPF busy cycles over total cycles counted by the CPF.

- Percent

* - CPF Stall

- Percent of CPF busy cycles where the CPF was stalled for any reason.

- Percent

* - CPF-L2 Utilization
- Percent of total cycles counted by the CPF-[L2](L2) interface where the CPF-L2 interface was active doing any work. The ratio of CPF-L2 busy cycles over total cycles counted by the CPF-L2.

- Percent of total cycles counted by the CPF-:doc:`L2 <l2-cache>` interface
where the CPF-L2 interface was active doing any work. The ratio of CPF-L2
busy cycles over total cycles counted by the CPF-L2.

- Percent

* - CPF-L2 Stall
- Percent of CPF-L2 busy cycles where the CPF-[L2](L2) interface was stalled for any reason.

- Percent of CPF-L2 busy cycles where the CPF-:doc:`L2 <l2-cache>`
interface was stalled for any reason.

- Percent

* - CPF-UTCL1 Stall
- Percent of CPF busy cycles where the CPF was stalled by address translation.

- Percent of CPF busy cycles where the CPF was stalled by address
translation.

- Percent

.. _cpc-metrics:

Command Processor Packet Processor (CPC) metrics
================================================

.. list-table::
:header-rows: 1
:widths: 20 65 15

* - Metric

- Description

- Unit

* - CPC Utilization
- Percent of total cycles where the CPC was busy actively doing any work. The ratio of CPC busy cycles over total cycles counted by the CPC.

- Percent of total cycles where the CPC was busy actively doing any work.
The ratio of CPC busy cycles over total cycles counted by the CPC.

- Percent

* - CPC Stall

- Percent of CPC busy cycles where the CPC was stalled for any reason.

- Percent

* - CPC Packet Decoding Utilization

- Percent of CPC busy cycles spent decoding commands for processing.

- Percent

* - CPC-Workgroup Manager Utilization
- Percent of CPC busy cycles spent dispatching workgroups to the [Workgroup Manager](SPI).

- Percent of CPC busy cycles spent dispatching workgroups to the
:ref:`workgroup manager <desc-spi>`.

- Percent

* - CPC-L2 Utilization
- Percent of total cycles counted by the CPC-[L2](L2) interface where the CPC-L2 interface was active doing any work.

- Percent of total cycles counted by the CPC-:doc:`L2 <l2-cache>` interface
where the CPC-L2 interface was active doing any work.

- Percent

* - CPC-UTCL1 Stall
- Percent of CPC busy cycles where the CPC was stalled by address translation.

- Percent of CPC busy cycles where the CPC was stalled by address
translation.

- Percent

* - CPC-UTCL2 Utilization
- Percent of total cycles counted by the CPC's L2 address translation interface where the CPC was busy doing address translation work.

- Percent of total cycles counted by the CPC's L2 address translation
interface where the CPC was busy doing address translation work.

- Percent

Loading

0 comments on commit 3020cca

Please sign in to comment.