Update

ROCm · Nov 5, 2024 · 10d8954 · 10d8954
1 parent 8c3e0c6
commit 10d8954
Show file tree

Hide file tree

Showing 4 changed files with 133 additions and 132 deletions.
diff --git a/docs/how-to/call_stack.rst b/docs/how-to/call_stack.rst
diff --git a/docs/how-to/hip_runtime_api/call_stack.rst b/docs/how-to/hip_runtime_api/call_stack.rst
@@ -0,0 +1,132 @@
+.. meta::
+    :description: This page lists graph safe ROCm libraries.
+    :keywords: AMD, ROCm, HIP, hipGRAPH
+
+********************************************************************************
+Call stack
+********************************************************************************
+
+The call stack and the program counter work together to manage the flow of
+control in a program, especially in the context of function calls, loops, and
+branching.
+
+The call stack is a concept in computer science, representing the stack data
+structure that stores information about the active subroutines or functions.
+Each time a function is called, a new frame is added to the top of the stack,
+containing information such as local variables, return addresses and function
+parameters. When the function execution completes, the frame is removed from the
+stack. This concept allows the program to return to the calling function and
+continue executing from where it left off.
+
+The program counter keeps track of the current or next instruction to
+be executed, while the call stack maintains the history of function calls,
+return addresses, and local variables. Together, they allow a program to execute
+complex control flow structures, handle function calls and return to the
+appropriate location in the code after a function completes.
+
+The call stack in GPU programming has extra complexity due to parallel execution
+also NVIDIA and AMD GPUs are using different approaches. NVIDIA GPUs have the
+independent thread scheduling feature since Volta GPU, where every thread has
+its own call stack and effective program counter. At AMD GPUs every
+warp/wavefront has its own call stack and program counter.
+
+The call stack for each thread must track its function calls, local variables,
+and return addresses.
+
+If a thread or warp exceeds its stack size, a stack overflow occurs, causing
+kernel failure. This can be detected and handled using debuggers or with the
+:doc:`error handling functions <../how-to/hip_runtime_api/error_handling>`.
+
+Call stack management with HIP
+================================================================================
+
+Developers can adjust the call stack size for every thread, allowing fine-tuning
+based on specific kernel requirements. This helps prevent stack overflow errors
+by ensuring sufficient stack memory is allocated.
+
+.. code-block:: cpp
+        
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    #define HIP_CHECK(expression)                \
+    {                                            \
+        const hipError_t status = expression;    \
+        if(status != hipSuccess){                \
+                std::cerr << "HIP error "        \
+                    << status << ": "            \
+                    << hipGetErrorString(status) \
+                    << " at " << __FILE__ << ":" \
+                    << __LINE__ << std::endl;    \
+        }                                        \
+    }
+
+    int main()
+    {
+        size_t stackSize;
+        HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
+        std::cout << "Default stack size: " << stackSize << " bytes" << std::endl;
+
+        // Set a new stack size
+        size_t newStackSize = 1024 * 8; // 1MB
+        HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize));
+        std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl;
+
+        HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
+        std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl;
+
+        return 0;
+    }
+
+Handling recursion and deep function calls
+--------------------------------------------------------------------------------
+
+Similar to CPU programming, recursive functions and deeply nested function calls
+are supported. However, developers must ensure that these functions do not
+exceed the available stack memory, considering the limited resources on GPUs.
+
+.. code-block:: cpp
+
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+    
+    #define HIP_CHECK(expression)                \
+    {                                            \
+        const hipError_t status = expression;    \
+        if(status != hipSuccess){                \
+                std::cerr << "HIP error "        \
+                    << status << ": "            \
+                    << hipGetErrorString(status) \
+                    << " at " << __FILE__ << ":" \
+                    << __LINE__ << std::endl;    \
+        }                                        \
+    }
+    
+    __device__ unsigned long long factorial(unsigned long long n)
+    {
+        if (n == 0 || n == 1) {
+            return 1;
+        }
+        return n * factorial(n - 1);
+    }
+    
+    __global__ void kernel(unsigned long long n)
+    {
+        unsigned long long result = factorial(n);
+        const size_t x = threadIdx.x + blockDim.x * blockIdx.x;
+        
+        if( x == 0)
+          printf("%llu! = %llu \n", n, result);
+    }
+    
+    int main()
+    {
+        kernel<<<1, 1>>>(10);
+        HIP_CHECK(hipDeviceSynchronize());
+
+        // With -O0 optimization option hit the stack limit
+        // kernel<<<1, 256>>>(2048);
+        // HIP_CHECK(hipDeviceSynchronize());
+        
+        return 0;
+    }
diff --git a/docs/index.md b/docs/index.md
@@ -37,7 +37,6 @@ The HIP documentation is organized into the following categories:
 * [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
 * {doc}`./how-to/hip_rtc`
 * {doc}`./how-to/performance_guidelines`
-* {doc}`./how-to/call_stack`
 * [Debugging with HIP](./how-to/debugging)
 * {doc}`./how-to/logging`
 * [Unified memory](./how-to/unified_memory)

diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in
@@ -48,11 +48,11 @@ subtrees:
           - file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
       - file: how-to/hip_runtime_api/cooperative_groups
       - file: how-to/hip_runtime_api/hipgraph
+      - file: how-to/hip_runtime_api/call_stack
   - file: how-to/hip_porting_guide
   - file: how-to/hip_porting_driver_api
   - file: how-to/hip_rtc
   - file: how-to/performance_guidelines
-  - file: how-to/call_stack
   - file: how-to/debugging
   - file: how-to/logging
   - file: how-to/cooperative_groups