Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Landing page update #3656

Merged
merged 3 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ omnitrace
overindex
overindexing
oversubscription
overutilized
parallelizable
pixelated
pragmas
preallocated
Expand Down Expand Up @@ -154,6 +156,7 @@ texels
tradeoffs
templated
toolkits
transfering
typedefs
unintuitive
UMM
Expand Down
11 changes: 11 additions & 0 deletions docs/how-to/hip_runtime_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,14 @@ the following figure:

On NVIDIA platform HIP runtime API calls CUDA runtime or CUDA driver via
hipother interface. For more information, see the `hipother repository <https://github.com/ROCm/hipother>`_.

Here are the various HIP Runtime API high level functions:

* :doc:`./hip_runtime_api/initialization`
* :doc:`./hip_runtime_api/memory_management`
* :doc:`./hip_runtime_api/error_handling`
* :doc:`./hip_runtime_api/cooperative_groups`
* :doc:`./hip_runtime_api/hipgraph`
* :doc:`./hip_runtime_api/call_stack`
* :doc:`./hip_runtime_api/multi_device`
* :doc:`./hip_runtime_api/external_interop`
42 changes: 15 additions & 27 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,45 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="HIP documentation and programming guide.">
<meta name="keywords" content="HIP, Heterogeneous-computing Interface for Portability, HIP programming guide">
</head>

# HIP documentation

neon60 marked this conversation as resolved.
Show resolved Hide resolved
The Heterogeneous-computing Interface for Portability (HIP) is a C++ runtime API and kernel language that lets you create portable applications for AMD and NVIDIA GPUs from a single source code. For more information, see [What is HIP?](./what_is_hip)
The Heterogeneous-computing Interface for Portability (HIP) is a C++ runtime API
and kernel language that lets you create portable applications for AMD and
NVIDIA GPUs from a single source code. For more information, see [What is HIP?](./what_is_hip)

Installation instructions are available from:

* [Installing HIP](./install/install)
neon60 marked this conversation as resolved.
Show resolved Hide resolved
* [Building HIP from source](./install/build)
neon60 marked this conversation as resolved.
Show resolved Hide resolved

HIP enabled GPUs:

* [Supported AMD GPUs on Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus)
* [Supported AMD GPUs on Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#windows-supported-gpus)
* [Supported NVIDIA GPUs](https://developer.nvidia.com/cuda-gpus)

The HIP documentation is organized into the following categories:

::::{grid} 1 2 2 2
:gutter: 3

:::{grid-item-card} Conceptual
:::{grid-item-card} Programming guide

* [Introduction](./programming_guide)
* {doc}`./understand/programming_model`
* {doc}`./understand/hardware_implementation`
* {doc}`./understand/amd_clr`
* {doc}`./understand/compilers`

:::

:::{grid-item-card} How to

* {doc}`./how-to/performance_guidelines`
* [Debugging with HIP](./how-to/debugging)
* {doc}`./how-to/logging`
* {doc}`./how-to/hip_runtime_api`
* {doc}`./how-to/hip_runtime_api/initialization`
* {doc}`./how-to/hip_runtime_api/memory_management`
* {doc}`./how-to/hip_runtime_api/error_handling`
* {doc}`./how-to/hip_runtime_api/multi_device`
* {doc}`./how-to/hip_runtime_api/cooperative_groups`
* {doc}`./how-to/hip_runtime_api/hipgraph`
* {doc}`./how-to/hip_runtime_api/call_stack`
* {doc}`./how-to/hip_runtime_api/external_interop`
* [HIP porting guide](./how-to/hip_porting_guide)
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
* {doc}`./how-to/hip_rtc`
* {doc}`./how-to/performance_guidelines`
* [Debugging with HIP](./how-to/debugging)
* {doc}`./how-to/logging`
* {doc}`./understand/amd_clr`

:::

:::{grid-item-card} Reference

* [HIP runtime API](./reference/hip_runtime_api_reference)
* [Modules](./reference/hip_runtime_api/modules)
* [Global defines, enums, structs and files](./reference/hip_runtime_api/global_defines_enums_structs_files)
* [HSA runtime API for ROCm](./reference/virtual_rocr)
* [C++ language extensions](./reference/cpp_language_extensions)
* [C++ language support](./reference/cpp_language_support)
Expand Down
79 changes: 79 additions & 0 deletions docs/programming_guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
.. meta::
:description: HIP programming guide introduction
:keywords: HIP programming guide introduction, HIP programming guide

.. _hip-programming-guide:

********************************************************************************
HIP programming guide introduction
********************************************************************************

This topic provides key HIP programming concepts and links to more detailed information.

Write GPU Kernels for Parallel Execution
MKKnorr marked this conversation as resolved.
Show resolved Hide resolved
================================================================================

To make the most of the parallelism inherent to GPUs, a thorough understanding
of the :ref:`programming model <programming_model>` is helpful. The HIP
programming model is designed to make it easy to map data-parallel algorithms to
architecture of the GPUs. HIP employs the SIMT-model (Single
Instruction Multiple Threads) with a multi-layered thread hierarchy for
efficient execution.

Understand the Target Architecture (CPU and GPU)
================================================================================

The :ref:`hardware implementation <hardware_implementation>` topic outlines the
GPUs supported by HIP. In general, GPUs are made up of Compute Units that excel
at executing parallelizable, computationally intensive workloads without complex
control-flow.

Increase parallelism on multiple level
================================================================================

To maximize performance and keep all system components fully utilized, the
application should expose and efficiently manage as much parallelism as possible.
:ref:`Parallel execution <parallel execution>` can be achieved at the
application, device, and multiprocessor levels.

The application’s host and device operations can achieve parallel execution
through asynchronous calls, streams, or HIP graphs. On the device level,
multiple kernels can execute concurrently when resources are available, and at
the multiprocessor level, developers can overlap data transfers with
computations to further optimize performance.

Memory management
================================================================================

GPUs generally have their own distinct memory, also called :ref:`device
memory <device_memory>`, separate from the :ref:`host memory <host_memory>`.
Device memory needs to be managed separately from the host memory. This includes
allocating the memory and transfering it between the host and the device. These
operations can be performance critical, so it's important to know how to use
them effectively. For more information, see :ref:`Memory management <memory_management>`.

Synchronize CPU and GPU Workloads
neon60 marked this conversation as resolved.
Show resolved Hide resolved
================================================================================

Tasks on the host and devices run asynchronously, so proper synchronization is
needed when dependencies between those tasks exist. The asynchronous execution of
tasks is useful for fully utilizing the available resources. Even when only a
single device is available, memory transfers and the execution of tasks can be
overlapped with asynchronous execution.

Error Handling
neon60 marked this conversation as resolved.
Show resolved Hide resolved
================================================================================

All functions in the HIP runtime API return an error value of type
:cpp:enum:`hipError_t` that can be used to verify whether the function was
successfully executed. It's important to confirm these
returned values, in order to catch and handle those errors, if possible.
An exception is kernel launches, which don't return any value. These
errors can be caught with specific functions like :cpp:func:`hipGetLastError()`.

Multi-GPU and Load Balancing
================================================================================

Large-scale applications that need more compute power can use multiple GPUs in
the system. This requires distributing workloads across multiple GPUs to balance
the load to prevent GPUs from being overutilized while others are idle.
15 changes: 7 additions & 8 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,16 @@ subtrees:
- url: https://developer.nvidia.com/cuda-gpus
title: NVIDIA supported GPUs

- caption: Conceptual
- caption: Programming guide
entries:
- file: programming_guide
title: Introduction
- file: understand/programming_model
- file: understand/hardware_implementation
- file: understand/amd_clr
- file: understand/compilers

- caption: How to
entries:
- file: how-to/performance_guidelines
- file: how-to/debugging
- file: how-to/logging
- file: how-to/hip_runtime_api
subtrees:
- entries:
Expand All @@ -56,9 +57,7 @@ subtrees:
- file: how-to/hip_porting_guide
- file: how-to/hip_porting_driver_api
- file: how-to/hip_rtc
- file: how-to/performance_guidelines
- file: how-to/debugging
- file: how-to/logging
- file: understand/amd_clr

- caption: Reference
entries:
neon60 marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
4 changes: 3 additions & 1 deletion docs/understand/programming_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
:description: This chapter explains the HIP programming model, the contract
between the programmer and the compiler/runtime executing the
code, how it maps to the hardware.
:keywords: AMD, ROCm, HIP, CUDA, API design
:keywords: ROCm, HIP, CUDA, API design, programming model

.. _programming_model:

*******************************************************************************
HIP programming model
Expand Down
Loading