Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: refactor and integrate into ROCm docs portal #362

Merged
merged 24 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
a0175e1
pip-compile docs/requirements.txt
peterjunpark May 6, 2024
f282605
style(conf.py): Apply black formatting to docs/conf.py
samjwu May 9, 2024
345e5a5
Update docs requirements
peterjunpark May 24, 2024
fb794f5
Add dependabot.yml and update CODEOWNERS
peterjunpark Jun 17, 2024
3c80be0
Port docs to rocm-docs standard
peterjunpark May 8, 2024
29eee16
impr internal linking and fix sphinx warnings
peterjunpark Jul 18, 2024
80b1cf6
add spellcheck/linting from rocm-docs-core
peterjunpark Jul 18, 2024
8821660
Merge branch 'dev' into docs/refactor
peterjunpark Jul 18, 2024
7980941
bump rocm-docs-core to 1.6.0
peterjunpark Jul 24, 2024
1a94f72
add fixes from @skyreflectedinmirrors and @lpaoletti
peterjunpark Jul 25, 2024
bcb858e
add package manager install section
peterjunpark Jul 25, 2024
6ec9958
add fixes
peterjunpark Jul 26, 2024
c821863
add custom css
peterjunpark Jul 29, 2024
47fa8f7
make images/figs click-to-expand
peterjunpark Jul 29, 2024
e916f9d
update documentation link in README
peterjunpark Jul 29, 2024
0eed6a0
formatting fixes
peterjunpark Jul 30, 2024
afa4abc
Merge branch 'dev' into docs/refactor
peterjunpark Jul 30, 2024
7912d52
fix heading
peterjunpark Jul 30, 2024
8480640
move archived docs
peterjunpark Jul 30, 2024
85c27a8
exclude archived docs from docs build
peterjunpark Jul 30, 2024
13c64b2
update archived docs workflow
peterjunpark Jul 30, 2024
77f3a2e
rm docs linting
peterjunpark Jul 30, 2024
426d632
Apply cmake-format suggested changes
samjwu Jul 30, 2024
5ff0963
Apply cmake-format
samjwu Jul 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 2 additions & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
* @koomie @coleramos425

# Documentation files
docs/* @ROCm/rocm-documentation
docs/ @ROCm/rocm-documentation
*.md @ROCm/rocm-documentation
*.rst @ROCm/rocm-documentation
.readthedocs.yaml @ROCm/rocm-documentation
18 changes: 18 additions & 0 deletions .github/workflows/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates

version: 2
updates:
- package-ecosystem: "pip" # See documentation for possible values
directory: "/docs/sphinx" # Location of package manifests
open-pull-requests-limit: 10
schedule:
interval: "daily"
target-branch: "dev"
labels:
- "documentation"
- "dependencies"
reviewers:
- "samjwu"
17 changes: 8 additions & 9 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ on:
push:
branches: ["main"]
paths:
- 'src/docs'
- 'src/archive/docs-1.x'
- 'docs/archive/docs-2.x/**'
- 'docs/archive/docs-1.x/**'
- '.github/workflows/docs.yml'
- 'VERSION'

workflow_dispatch:

Expand All @@ -31,24 +30,24 @@ jobs:
- name: Checkout
uses: actions/checkout@v4
- name: Additional python packages
run: pip3 install -r requirements-doc.txt
run: pip3 install -r docs/archive/requirements-doc.txt
- name: Setup Pages
uses: actions/configure-pages@v4
- name: Build 1.x docs
run: |
cd src/archive/docs-1.x
cd docs/archive/docs-1.x
make html
- name: Build current docs
- name: Build 2.x docs
run: |
cd src/docs
cd docs/archive/docs-2.x
make html
- name: Relocate 1.x docs
run: |
mv src/archive/docs-1.x/_build/html src/docs/_build/html/1.x
mv docs/archive/docs-1.x/_build/html docs/archive/_build/html/1.x
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: ./src/docs/_build/html
path: ./docs/archive/_build/html

# Deployment job
deploy:
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,8 @@ VERSION.sha

# temp files
/tests/Testing

# documentation artifacts
/_build
_toc.yml

13 changes: 13 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

build:
os: ubuntu-22.04
tools:
python: "3.10"

python:
install:
- requirements: docs/sphinx/requirements.txt
45 changes: 27 additions & 18 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -189,46 +189,51 @@ message(STATUS "Pytest CPU threadcount: ${PYTEST_NUMPROCS}")

add_test(
NAME test_profile_kernel_execution
COMMAND ${Python3_EXECUTABLE} -m pytest -m kernel_execution --junitxml=tests/test_profile_kernel_execution.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND
${Python3_EXECUTABLE} -m pytest -m kernel_execution
--junitxml=tests/test_profile_kernel_execution.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

add_test(
NAME test_profile_ipblocks
COMMAND ${Python3_EXECUTABLE} -m pytest -m block --junitxml=tests/test_profile_blocks.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND
${Python3_EXECUTABLE} -m pytest -m block --junitxml=tests/test_profile_blocks.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
set_property(TEST test_profile_ipblocks PROPERTY COST 11)

add_test(
NAME test_profile_dispatch
COMMAND ${Python3_EXECUTABLE} -m pytest -m dispatch --junitxml=tests/test_profile_dispatch.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND
${Python3_EXECUTABLE} -m pytest -m dispatch
--junitxml=tests/test_profile_dispatch.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
set_property(TEST test_profile_ipblocks PROPERTY COST 5)

add_test(
NAME test_profile_mem
COMMAND ${Python3_EXECUTABLE} -m pytest -m mem --junitxml=tests/test_profile_mem.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND ${Python3_EXECUTABLE} -m pytest -m mem --junitxml=tests/test_profile_mem.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

add_test(
NAME test_profile_join
COMMAND ${Python3_EXECUTABLE} -m pytest -m join --junitxml=tests/test_profile_join.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND ${Python3_EXECUTABLE} -m pytest -m join --junitxml=tests/test_profile_join.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

add_test(
NAME test_profile_sort
COMMAND ${Python3_EXECUTABLE} -m pytest -m sort --junitxml=tests/test_profile_sort.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND ${Python3_EXECUTABLE} -m pytest -m sort --junitxml=tests/test_profile_sort.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

add_test(
NAME test_profile_misc
COMMAND ${Python3_EXECUTABLE} -m pytest -m misc --junitxml=tests/test_profile_misc.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
COMMAND ${Python3_EXECUTABLE} -m pytest -m misc --junitxml=tests/test_profile_misc.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

set_tests_properties(
Expand All @@ -247,8 +252,10 @@ set_tests_properties(

add_test(
NAME test_analyze_commands
COMMAND ${Python3_EXECUTABLE} -m pytest -n ${PYTEST_NUMPROCS} --junitxml=tests/test_analyze_commands.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_analyze_commands.py
COMMAND
${Python3_EXECUTABLE} -m pytest -n ${PYTEST_NUMPROCS}
--junitxml=tests/test_analyze_commands.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_analyze_commands.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

# ---------------------------
Expand All @@ -257,8 +264,10 @@ add_test(

add_test(
NAME test_analyze_workloads
COMMAND ${Python3_EXECUTABLE} -m pytest -n ${PYTEST_NUMPROCS} --junitxml=tests/test_analyze_workloads.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_analyze_workloads.py
COMMAND
${Python3_EXECUTABLE} -m pytest -n ${PYTEST_NUMPROCS}
--junitxml=tests/test_analyze_workloads.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_analyze_workloads.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

# ---------
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
[![Docs](https://github.com/ROCm/omniperf/actions/workflows/docs.yml/badge.svg)](https://rocm.github.io/omniperf/)
[![DOI](https://zenodo.org/badge/561919887.svg)](https://zenodo.org/badge/latestdoi/561919887)


# Omniperf

## General

Omniperf is a system performance profiling tool for machine
learning/HPC workloads running on AMD MI GPUs. The tool presently
targets usage on MI100, MI200, and MI300 accelerators.

* For more information on available features, installation steps, and
workload profiling and analysis, please refer to the online
[documentation](https://rocm.github.io/omniperf).
[documentation](https://rocm.docs.amd.com/projects/omniperf/en/latest/).

* Omniperf is an AMD open source research project and is not supported
as part of the ROCm software stack. We welcome contributions and
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions src/docs/Makefile → docs/archive/docs-2.x/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
BUILDDIR = ../_build

# Put it first so that "make" without argument is like "make help".
help:
Expand All @@ -17,4 +17,4 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
File renamed without changes.
1 change: 1 addition & 0 deletions docs/archive/docs-2.x/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2.0.1
File renamed without changes.
4 changes: 2 additions & 2 deletions src/docs/conf.py → docs/archive/docs-2.x/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@

repo_version = "unknown"
# Determine short version by file in repo
if os.path.isfile("../../VERSION"):
with open("../../VERSION") as f:
if os.path.isfile("./VERSION"):
with open("./VERSION") as f:
repo_version = f.readline().strip()


Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
154 changes: 154 additions & 0 deletions docs/conceptual/command-processor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
.. meta::
:description: Omniperf performance model: Command processor (CP)
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC

**********************
Command processor (CP)
**********************

The command processor (CP) is responsible for interacting with the AMDGPU kernel
driver -- the Linux kernel -- on the CPU and for interacting with user-space
HSA clients when they submit commands to HSA queues. Basic tasks of the CP
include reading commands (such as, corresponding to a kernel launch) out of
:hsa-runtime-pdf:`HSA queues <68>`, scheduling work to subsequent parts of the
scheduler pipeline, and marking kernels complete for synchronization events on
the host.

The command processor consists of two sub-components:

* :ref:`Fetcher <cpf-metrics>` (CPF): Fetches commands out of memory to hand
them over to the CPC for processing.

* :ref:`Packet processor <cpc-metrics>` (CPC): Micro-controller running the
command processing firmware that decodes the fetched commands and (for
kernels) passes them to the :ref:`workgroup processors <desc-spi>` for
scheduling.

Before scheduling work to the accelerator, the command processor can
first acquire a memory fence to ensure system consistency
(:hsa-runtime-pdf:`Section 2.6.4 <91>`). After the work is complete, the
command processor can apply a memory-release fence. Depending on the AMD CDNA™
accelerator under question, either of these operations *might* initiate a cache
write-back or invalidation.

Analyzing command processor performance is most interesting for kernels
that you suspect to be limited by scheduling or launch rate. The command
processor’s metrics therefore are focused on reporting, for example:

* Utilization of the fetcher

* Utilization of the packet processor, and decoding processing packets

* Stalls in fetching and processing

.. _cpf-metrics:

Command processor fetcher (CPF)
===============================

.. list-table::
:header-rows: 1

* - Metric

- Description

- Unit

* - CPF Utilization

- Percent of total cycles where the CPF was busy actively doing any work.
The ratio of CPF busy cycles over total cycles counted by the CPF.

- Percent

* - CPF Stall

- Percent of CPF busy cycles where the CPF was stalled for any reason.

- Percent

* - CPF-L2 Utilization

- Percent of total cycles counted by the CPF-:doc:`L2 <l2-cache>` interface
where the CPF-L2 interface was active doing any work. The ratio of CPF-L2
busy cycles over total cycles counted by the CPF-L2.

- Percent

* - CPF-L2 Stall

- Percent of CPF-:doc:`L2 <l2-cache>` L2 busy cycles where the CPF-L2
interface was stalled for any reason.

- Percent

* - CPF-UTCL1 Stall

- Percent of CPF busy cycles where the CPF was stalled by address
translation.

- Percent

.. _cpc-metrics:

Command processor packet processor (CPC)
========================================

.. list-table::
:header-rows: 1

* - Metric

- Description

- Unit

* - CPC Utilization

- Percent of total cycles where the CPC was busy actively doing any work.
The ratio of CPC busy cycles over total cycles counted by the CPC.

- Percent

* - CPC Stall

- Percent of CPC busy cycles where the CPC was stalled for any reason.

- Percent

* - CPC Packet Decoding Utilization

- Percent of CPC busy cycles spent decoding commands for processing.

- Percent

* - CPC-Workgroup Manager Utilization

- Percent of CPC busy cycles spent dispatching workgroups to the
:ref:`workgroup manager <desc-spi>`.

- Percent

* - CPC-L2 Utilization

- Percent of total cycles counted by the CPC-:doc:`L2 <l2-cache>` interface
where the CPC-L2 interface was active doing any work.

- Percent

* - CPC-UTCL1 Stall

- Percent of CPC busy cycles where the CPC was stalled by address
translation.

- Percent

* - CPC-UTCL2 Utilization

- Percent of total cycles counted by the CPC's :doc:`L2 <l2-cache>` address
translation interface where the CPC was busy doing address translation
work.

- Percent

Loading