WIP

ROCm · Nov 6, 2024 · 507ebf1 · 507ebf1
1 parent 814f61c
commit 507ebf1
Showing 1 changed file with 11 additions and 15 deletions.
diff --git a/docs/how-to/hip_runtime_api/call_stack.rst b/docs/how-to/hip_runtime_api/call_stack.rst
@@ -6,10 +6,6 @@
 Call stack
 *******************************************************************************
 
-The call stack and the program counter work together to manage the flow of
-control in a program, especially in the context of function calls, loops, and
-branching.
-
 The call stack is a concept in computer science, representing the stack data
 structure that used for managing function calls, by saving the state of the
 current function. Each time a function is called, a new call frame is added to
@@ -29,14 +25,15 @@ and return addresses. However, in GPU programming, the memory required to store
 the call stack increases due to the parallelism inherent to the GPUs. NVIDIA
 and AMD GPUs are using different approaches. NVIDIA GPUs have the independent
 thread scheduling feature since Volta GPU, where every thread has its own call
-stack and effective program counter. At AMD GPUs every warp/wavefront has its
-own call stack and program counter.
+stack and effective program counter. On AMD GPUs threads are grouped;
+therefore, every warp/wavefront has its own call stack and program counter, but
+not each individual thread.
 
 If a thread or warp exceeds its stack size, a stack overflow occurs, causing
 kernel failure. This can be detected using debuggers.
 
 Call stack management with HIP
-================================================================================
+===============================================================================
 
 Developers can adjust the call stack size, allowing fine-tuning based on
 specific kernel requirements. This helps prevent stack overflow errors by
@@ -68,7 +65,6 @@ ensuring sufficient stack memory is allocated.
         // Set a new stack size
         size_t newStackSize = 1024 * 8; // 8 KiB
         HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize));
-        std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl;
 
         HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
         std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl;
@@ -81,12 +77,12 @@ amount of memory. For instance, an MI300X with 304 computing units (CU) and up
 to 2048 threads per CU would use 304 · 2048 · 8192 bytes = 4.75 GiB.
 
 Handling recursion and deep function calls
---------------------------------------------------------------------------------
+-------------------------------------------------------------------------------
 
-Similar to CPU programming, recursive functions and deeply nested function calls
-are supported. However, developers must ensure that these functions do not
-exceed the available stack memory, considering a huge amount of memory needed
-for the call stack due to the GPUs inherent parallelism.
+Similar to CPU programming, recursive functions and deeply nested function
+calls are supported. However, developers must ensure that these functions do
+not exceed the available stack memory, considering the huge amount of memory
+needed for the call stack due to the GPUs inherent parallelism.
 
 .. code-block:: cpp
 
@@ -128,8 +124,8 @@ for the call stack due to the GPUs inherent parallelism.
         kernel<<<1, 1>>>(10);
         HIP_CHECK(hipDeviceSynchronize());
 
-        // This hits the stack limit due to excessive recursion
-        // kernel<<<1, 1>>>(2048);
+        // With -O0 optimization option hit the stack limit
+        // kernel<<<1, 256>>>(2048);
         // HIP_CHECK(hipDeviceSynchronize());
 
         return 0;