ROCm · neon60 · Nov 22, 2024 · Nov 19, 2024 · Nov 21, 2024 · Nov 22, 2024
@@ -116,6 +116,8 @@ omnitrace
 overindex
 overindexing
 oversubscription
+overutilized
+parallelizable
 pixelated
 pragmas
 preallocated
@@ -154,6 +156,7 @@ texels
 tradeoffs
 templated
 toolkits
+transfering
 typedefs
 unintuitive
 UMM

@@ -34,3 +34,14 @@ the following figure:
 
   On NVIDIA platform HIP runtime API calls CUDA runtime or CUDA driver via
   hipother interface. For more information, see the `hipother repository <https://github.com/ROCm/hipother>`_.
+
+Here are the various HIP Runtime API high level functions:
+
+* :doc:`./hip_runtime_api/initialization`
+* :doc:`./hip_runtime_api/memory_management`
+* :doc:`./hip_runtime_api/error_handling`  
+* :doc:`./hip_runtime_api/cooperative_groups`
+* :doc:`./hip_runtime_api/hipgraph`
+* :doc:`./hip_runtime_api/call_stack`
+* :doc:`./hip_runtime_api/multi_device`
+* :doc:`./hip_runtime_api/external_interop`
@@ -1,57 +1,45 @@
+<head>
+  <meta charset="UTF-8">
+  <meta name="description" content="HIP documentation and programming guide.">
+  <meta name="keywords" content="HIP, Heterogeneous-computing Interface for Portability, HIP programming guide">
+</head>
+
 # HIP documentation
 
-The Heterogeneous-computing Interface for Portability (HIP) is a C++ runtime API and kernel language that lets you create portable applications for AMD and NVIDIA GPUs from a single source code. For more information, see [What is HIP?](./what_is_hip)
+The Heterogeneous-computing Interface for Portability (HIP) is a C++ runtime API
+and kernel language that lets you create portable applications for AMD and
+NVIDIA GPUs from a single source code. For more information, see [What is HIP?](./what_is_hip)
 
 Installation instructions are available from:
 
 * [Installing HIP](./install/install)
 * [Building HIP from source](./install/build)
 
-HIP enabled GPUs:  
-
-* [Supported AMD GPUs on Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus)
-* [Supported AMD GPUs on Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#windows-supported-gpus)
-* [Supported NVIDIA GPUs](https://developer.nvidia.com/cuda-gpus)
-
 The HIP documentation is organized into the following categories:
 
 ::::{grid} 1 2 2 2
 :gutter: 3
 
-:::{grid-item-card} Conceptual
+:::{grid-item-card} Programming guide
 
+* [Introduction](./programming_guide)
 * {doc}`./understand/programming_model`
 * {doc}`./understand/hardware_implementation`
-* {doc}`./understand/amd_clr`
 * {doc}`./understand/compilers`
-
-:::
-
-:::{grid-item-card} How to
-
+* {doc}`./how-to/performance_guidelines`
+* [Debugging with HIP](./how-to/debugging)
+* {doc}`./how-to/logging`
 * {doc}`./how-to/hip_runtime_api`
-  * {doc}`./how-to/hip_runtime_api/initialization`
-  * {doc}`./how-to/hip_runtime_api/memory_management`
-  * {doc}`./how-to/hip_runtime_api/error_handling`  
-  * {doc}`./how-to/hip_runtime_api/multi_device`
-  * {doc}`./how-to/hip_runtime_api/cooperative_groups`
-  * {doc}`./how-to/hip_runtime_api/hipgraph`
-  * {doc}`./how-to/hip_runtime_api/call_stack`
-  * {doc}`./how-to/hip_runtime_api/external_interop`
 * [HIP porting guide](./how-to/hip_porting_guide)
 * [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
 * {doc}`./how-to/hip_rtc`
-* {doc}`./how-to/performance_guidelines`
-* [Debugging with HIP](./how-to/debugging)
-* {doc}`./how-to/logging`
+* {doc}`./understand/amd_clr`
 
 :::
 
 :::{grid-item-card} Reference
 
 * [HIP runtime API](./reference/hip_runtime_api_reference)
-  * [Modules](./reference/hip_runtime_api/modules)
-  * [Global defines, enums, structs and files](./reference/hip_runtime_api/global_defines_enums_structs_files)
 * [HSA runtime API for ROCm](./reference/virtual_rocr)
 * [C++ language extensions](./reference/cpp_language_extensions)
 * [C++ language support](./reference/cpp_language_support)

@@ -0,0 +1,79 @@
+.. meta::
+    :description: HIP programming guide introduction
+    :keywords: HIP programming guide introduction, HIP programming guide
+
+.. _hip-programming-guide:
+
+********************************************************************************
+HIP programming guide introduction
+********************************************************************************
+
+This topic provides key HIP programming concepts and links to more detailed information. 
+
+Write GPU Kernels for Parallel Execution
+================================================================================
+
+To make the most of the parallelism inherent to GPUs, a thorough understanding
+of the :ref:`programming model <programming_model>` is helpful. The HIP
+programming model is designed to make it easy to map data-parallel algorithms to
+architecture of the GPUs. HIP employs the SIMT-model (Single
+Instruction Multiple Threads) with a multi-layered thread hierarchy for
+efficient execution.
+
+Understand the Target Architecture (CPU and GPU)
+================================================================================
+
+The :ref:`hardware implementation <hardware_implementation>` topic outlines the
+GPUs supported by HIP. In general, GPUs are made up of Compute Units that excel
+at executing parallelizable, computationally intensive workloads without complex
+control-flow.
+
+Increase parallelism on multiple level
+================================================================================
+
+To maximize performance and keep all system components fully utilized, the
+application should expose and efficiently manage as much parallelism as possible.
+:ref:`Parallel execution <parallel execution>` can be achieved at the
+application, device, and multiprocessor levels.
+
+The application’s host and device operations can achieve parallel execution
+through asynchronous calls, streams, or HIP graphs. On the device level,
+multiple kernels can execute concurrently when resources are available, and at
+the multiprocessor level, developers can overlap data transfers with
+computations to further optimize performance.
+
+Memory management
+================================================================================
+
+GPUs generally have their own distinct memory, also called :ref:`device
+memory <device_memory>`, separate from the :ref:`host memory <host_memory>`.
+Device memory needs to be managed separately from the host memory. This includes
+allocating the memory and transfering it between the host and the device. These
+operations can be performance critical, so it's important to know how to use
+them effectively. For more information, see :ref:`Memory management <memory_management>`.
+
+Synchronize CPU and GPU Workloads
+================================================================================
+
+Tasks on the host and devices run asynchronously, so proper synchronization is
+needed when dependencies between those tasks exist. The asynchronous execution of
+tasks is useful for fully utilizing the available resources. Even when only a
+single device is available, memory transfers and the execution of tasks can be
+overlapped with asynchronous execution.
+
+Error Handling
+================================================================================
+
+All functions in the HIP runtime API return an error value of type
+:cpp:enum:`hipError_t` that can be used to verify whether the function was
+successfully executed. It's important to confirm these
+returned values, in order to catch and handle those errors, if possible.
+An exception is kernel launches, which don't return any value. These
+errors can be caught with specific functions like :cpp:func:`hipGetLastError()`.
+
+Multi-GPU and Load Balancing
+================================================================================
+
+Large-scale applications that need more compute power can use multiple GPUs in
+the system. This requires distributing workloads across multiple GPUs to balance
+the load to prevent GPUs from being overutilized while others are idle.
@@ -22,15 +22,16 @@ subtrees:
   - url: https://developer.nvidia.com/cuda-gpus
     title: NVIDIA supported GPUs
 
-- caption: Conceptual
+- caption: Programming guide
   entries:
+  - file: programming_guide
+    title: Introduction
   - file: understand/programming_model
   - file: understand/hardware_implementation
-  - file: understand/amd_clr
   - file: understand/compilers
-
-- caption: How to
-  entries:
+  - file: how-to/performance_guidelines
+  - file: how-to/debugging
+  - file: how-to/logging
   - file: how-to/hip_runtime_api
     subtrees:
     - entries:
@@ -56,9 +57,7 @@ subtrees:
   - file: how-to/hip_porting_guide
   - file: how-to/hip_porting_driver_api
   - file: how-to/hip_rtc
-  - file: how-to/performance_guidelines
-  - file: how-to/debugging
-  - file: how-to/logging
+  - file: understand/amd_clr
 
 - caption: Reference
   entries:

@@ -2,7 +2,9 @@
   :description: This chapter explains the HIP programming model, the contract
                 between the programmer and the compiler/runtime executing the
                 code, how it maps to the hardware.
-  :keywords: AMD, ROCm, HIP, CUDA, API design
+  :keywords: ROCm, HIP, CUDA, API design, programming model
+
+.. _programming_model:
 
 *******************************************************************************
 HIP programming model