Add call stack management

ROCm · Nov 5, 2024 · 8c3e0c6 · 8c3e0c6
1 parent 96cf3a6
commit 8c3e0c6
Show file tree

Hide file tree

Showing 3 changed files with 132 additions and 0 deletions.
diff --git a/docs/how-to/call_stack.rst b/docs/how-to/call_stack.rst
@@ -0,0 +1,130 @@
+.. meta::
+    :description: This page lists graph safe ROCm libraries.
+    :keywords: AMD, ROCm, HIP, hipGRAPH
+
+********************************************************************************
+Call stack
+********************************************************************************
+
+The call stack is a concept in computer science, representing the stack data structure that stores information about the active subroutines or functions. Each time a function is called, a new frame is added to the top of the stack, containing information such as local variables, return addresses and function parameters. When the function execution completes, the frame is removed from the stack. This concept allows the program to return to the calling function and continue executing from where it left off.
+
+GPU programming
+===============
+
+The call stack in GPU programming has extra complexity due to parallel execution:
+
+Parallel Execution: Each thread in a GPU can invoke functions, leading to multiple call stacks being managed simultaneously. This requires efficient memory use due to the large number of concurrent calls.
+
+Memory Limitations: The call stack must manage limited memory. Each thread has a small stack space, and exceeding this can cause stack overflow errors. Effective management of stack memory is crucial, especially when code involves recursive functions or deep nesting.
+
+Kernel Execution: When a kernel launches, each thread independently executes the kernel function. The call stack for each thread must track its function calls, local variables, and return addresses.
+
+Stack Overflow: If a thread exceeds its stack size, a stack overflow occurs, causing kernel failure. This can be detected and handled using debuggers or runtime checks.
+
+Call stack management with AMD HIP
+==================================
+
+Query and set stack size
+------------------------
+
+Developers can adjust the call stack size for their applications, allowing fine-tuning based on specific kernel requirements. This helps prevent stack overflow errors by ensuring sufficient stack memory is allocated.
+
+.. code-block:: cpp
+        
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    int main()
+    {
+        size_t stackSize;
+        hipDeviceGetLimit(&stackSize, hipLimitStackSize);
+        std::cout << "Default stack size: " << stackSize << " bytes" << std::endl;
+
+        // Set a new stack size
+        size_t newStackSize = 1024 * 1024; // 1MB
+        hipDeviceSetLimit(hipLimitStackSize, newStackSize);
+        std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl;
+
+        hipDeviceGetLimit(&stackSize, hipLimitStackSize);
+        std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl;
+
+        return 0;
+    }
+
+
+Handling recursion and deep function calls
+------------------------------------------
+Similar to CPU programming, recursive functions and deeply nested function calls are supported. However, developers must ensure that these functions do not exceed the available stack memory, considering the limited resources on GPUs.
+
+.. code-block:: cpp
+
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    __global__ void recursiveFunction(int n)
+    {
+        if (n > 0)
+        {
+            recursiveFunction<<<1, 1>>>(n - 1);
+        }
+    }
+
+    int main()
+    {
+        recursiveFunction<<<1, 1>>>(10);
+        hipDeviceSynchronize();
+        return 0;
+    }
+
+
+Error handling
+--------------
+Mechanisms are provided to detect stack overflow errors. When a stack overflow occurs, the kernel call fails, and the error can be managed using debuggers or runtime checks, helping maintain the stability and reliability of applications.
+
+.. code-block:: cpp
+
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    int main()
+    {
+        size_t stackSize;
+        hipDeviceGetLimit(&stackSize, hipLimitStackSize);
+        std::cout << "Current stack size: " << stackSize << " bytes" << std::endl;
+
+        // Simulate a stack overflow
+        size_t newStackSize = stackSize / 2; // Reduce stack size
+        hipDeviceSetLimit(hipLimitStackSize, newStackSize);
+
+        // Kernel call to provoke stack overflow
+        recursiveFunction<<<1, 1>>>(20);
+        hipError_t err = hipDeviceSynchronize();
+        if (err != hipSuccess)
+        {
+            std::cerr << "Kernel call failed with error: " << hipGetErrorString(err) << std::endl;
+        }
+
+        return 0;
+    }
+
+
+Portability and performance
+---------------------------
+Designed to provide a minimal abstraction layer, HIP minimizes performance overhead and maximizes portability. This allows writing high-performance, cross-platform GPU code without sacrificing efficiency.
+
+.. code-block:: cpp
+
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    __global__ void simpleKernel()
+    {
+        printf("Running on GPU\n");
+    }
+
+    int main()
+    {
+        simpleKernel<<<1, 1>>>();
+        hipDeviceSynchronize();
+        return 0;
+    }
diff --git a/docs/index.md b/docs/index.md
@@ -37,6 +37,7 @@ The HIP documentation is organized into the following categories:
 * [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
 * {doc}`./how-to/hip_rtc`
 * {doc}`./how-to/performance_guidelines`
+* {doc}`./how-to/call_stack`
 * [Debugging with HIP](./how-to/debugging)
 * {doc}`./how-to/logging`
 * [Unified memory](./how-to/unified_memory)

diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in
@@ -52,6 +52,7 @@ subtrees:
   - file: how-to/hip_porting_driver_api
   - file: how-to/hip_rtc
   - file: how-to/performance_guidelines
+  - file: how-to/call_stack
   - file: how-to/debugging
   - file: how-to/logging
   - file: how-to/cooperative_groups