Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
matyas-streamhpc committed Nov 6, 2024
1 parent 814f61c commit 507ebf1
Showing 1 changed file with 11 additions and 15 deletions.
26 changes: 11 additions & 15 deletions docs/how-to/hip_runtime_api/call_stack.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@
Call stack
*******************************************************************************

The call stack and the program counter work together to manage the flow of
control in a program, especially in the context of function calls, loops, and
branching.

The call stack is a concept in computer science, representing the stack data
structure that used for managing function calls, by saving the state of the
current function. Each time a function is called, a new call frame is added to
Expand All @@ -29,14 +25,15 @@ and return addresses. However, in GPU programming, the memory required to store
the call stack increases due to the parallelism inherent to the GPUs. NVIDIA
and AMD GPUs are using different approaches. NVIDIA GPUs have the independent
thread scheduling feature since Volta GPU, where every thread has its own call
stack and effective program counter. At AMD GPUs every warp/wavefront has its
own call stack and program counter.
stack and effective program counter. On AMD GPUs threads are grouped;
therefore, every warp/wavefront has its own call stack and program counter, but
not each individual thread.

If a thread or warp exceeds its stack size, a stack overflow occurs, causing
kernel failure. This can be detected using debuggers.

Call stack management with HIP
================================================================================
===============================================================================

Developers can adjust the call stack size, allowing fine-tuning based on
specific kernel requirements. This helps prevent stack overflow errors by
Expand Down Expand Up @@ -68,7 +65,6 @@ ensuring sufficient stack memory is allocated.
// Set a new stack size
size_t newStackSize = 1024 * 8; // 8 KiB
HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize));
std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl;
HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl;
Expand All @@ -81,12 +77,12 @@ amount of memory. For instance, an MI300X with 304 computing units (CU) and up
to 2048 threads per CU would use 304 · 2048 · 8192 bytes = 4.75 GiB.

Handling recursion and deep function calls
--------------------------------------------------------------------------------
-------------------------------------------------------------------------------

Similar to CPU programming, recursive functions and deeply nested function calls
are supported. However, developers must ensure that these functions do not
exceed the available stack memory, considering a huge amount of memory needed
for the call stack due to the GPUs inherent parallelism.
Similar to CPU programming, recursive functions and deeply nested function
calls are supported. However, developers must ensure that these functions do
not exceed the available stack memory, considering the huge amount of memory
needed for the call stack due to the GPUs inherent parallelism.

.. code-block:: cpp
Expand Down Expand Up @@ -128,8 +124,8 @@ for the call stack due to the GPUs inherent parallelism.
kernel<<<1, 1>>>(10);
HIP_CHECK(hipDeviceSynchronize());
// This hits the stack limit due to excessive recursion
// kernel<<<1, 1>>>(2048);
// With -O0 optimization option hit the stack limit
// kernel<<<1, 256>>>(2048);
// HIP_CHECK(hipDeviceSynchronize());
return 0;
Expand Down

0 comments on commit 507ebf1

Please sign in to comment.