Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 authored and matyas-streamhpc committed Nov 5, 2024
1 parent 8c3e0c6 commit 10d8954
Show file tree
Hide file tree
Showing 4 changed files with 133 additions and 132 deletions.
130 changes: 0 additions & 130 deletions docs/how-to/call_stack.rst

This file was deleted.

132 changes: 132 additions & 0 deletions docs/how-to/hip_runtime_api/call_stack.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
.. meta::
:description: This page lists graph safe ROCm libraries.
:keywords: AMD, ROCm, HIP, hipGRAPH

********************************************************************************
Call stack
********************************************************************************

The call stack and the program counter work together to manage the flow of
control in a program, especially in the context of function calls, loops, and
branching.

The call stack is a concept in computer science, representing the stack data
structure that stores information about the active subroutines or functions.
Each time a function is called, a new frame is added to the top of the stack,
containing information such as local variables, return addresses and function
parameters. When the function execution completes, the frame is removed from the
stack. This concept allows the program to return to the calling function and
continue executing from where it left off.

The program counter keeps track of the current or next instruction to
be executed, while the call stack maintains the history of function calls,
return addresses, and local variables. Together, they allow a program to execute
complex control flow structures, handle function calls and return to the
appropriate location in the code after a function completes.

The call stack in GPU programming has extra complexity due to parallel execution
also NVIDIA and AMD GPUs are using different approaches. NVIDIA GPUs have the
independent thread scheduling feature since Volta GPU, where every thread has
its own call stack and effective program counter. At AMD GPUs every
warp/wavefront has its own call stack and program counter.

The call stack for each thread must track its function calls, local variables,
and return addresses.

If a thread or warp exceeds its stack size, a stack overflow occurs, causing
kernel failure. This can be detected and handled using debuggers or with the
:doc:`error handling functions <../how-to/hip_runtime_api/error_handling>`.

Call stack management with HIP
================================================================================

Developers can adjust the call stack size for every thread, allowing fine-tuning
based on specific kernel requirements. This helps prevent stack overflow errors
by ensuring sufficient stack memory is allocated.

.. code-block:: cpp
#include <hip/hip_runtime.h>
#include <iostream>
#define HIP_CHECK(expression) \
{ \
const hipError_t status = expression; \
if(status != hipSuccess){ \
std::cerr << "HIP error " \
<< status << ": " \
<< hipGetErrorString(status) \
<< " at " << __FILE__ << ":" \
<< __LINE__ << std::endl; \
} \
}
int main()
{
size_t stackSize;
HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
std::cout << "Default stack size: " << stackSize << " bytes" << std::endl;
// Set a new stack size
size_t newStackSize = 1024 * 8; // 1MB
HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize));
std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl;
HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl;
return 0;
}
Handling recursion and deep function calls
--------------------------------------------------------------------------------

Similar to CPU programming, recursive functions and deeply nested function calls
are supported. However, developers must ensure that these functions do not
exceed the available stack memory, considering the limited resources on GPUs.

.. code-block:: cpp
#include <hip/hip_runtime.h>
#include <iostream>
#define HIP_CHECK(expression) \
{ \
const hipError_t status = expression; \
if(status != hipSuccess){ \
std::cerr << "HIP error " \
<< status << ": " \
<< hipGetErrorString(status) \
<< " at " << __FILE__ << ":" \
<< __LINE__ << std::endl; \
} \
}
__device__ unsigned long long factorial(unsigned long long n)
{
if (n == 0 || n == 1) {
return 1;
}
return n * factorial(n - 1);
}
__global__ void kernel(unsigned long long n)
{
unsigned long long result = factorial(n);
const size_t x = threadIdx.x + blockDim.x * blockIdx.x;
if( x == 0)
printf("%llu! = %llu \n", n, result);
}
int main()
{
kernel<<<1, 1>>>(10);
HIP_CHECK(hipDeviceSynchronize());
// With -O0 optimization option hit the stack limit
// kernel<<<1, 256>>>(2048);
// HIP_CHECK(hipDeviceSynchronize());
return 0;
}
1 change: 0 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ The HIP documentation is organized into the following categories:
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
* {doc}`./how-to/hip_rtc`
* {doc}`./how-to/performance_guidelines`
* {doc}`./how-to/call_stack`
* [Debugging with HIP](./how-to/debugging)
* {doc}`./how-to/logging`
* [Unified memory](./how-to/unified_memory)
Expand Down
2 changes: 1 addition & 1 deletion docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,11 @@ subtrees:
- file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_runtime_api/hipgraph
- file: how-to/hip_runtime_api/call_stack
- file: how-to/hip_porting_guide
- file: how-to/hip_porting_driver_api
- file: how-to/hip_rtc
- file: how-to/performance_guidelines
- file: how-to/call_stack
- file: how-to/debugging
- file: how-to/logging
- file: how-to/cooperative_groups
Expand Down

0 comments on commit 10d8954

Please sign in to comment.