-
Notifications
You must be signed in to change notification settings - Fork 540
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8c3e0c6
commit 10d8954
Showing
4 changed files
with
133 additions
and
132 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
.. meta:: | ||
:description: This page lists graph safe ROCm libraries. | ||
:keywords: AMD, ROCm, HIP, hipGRAPH | ||
|
||
******************************************************************************** | ||
Call stack | ||
******************************************************************************** | ||
|
||
The call stack and the program counter work together to manage the flow of | ||
control in a program, especially in the context of function calls, loops, and | ||
branching. | ||
|
||
The call stack is a concept in computer science, representing the stack data | ||
structure that stores information about the active subroutines or functions. | ||
Each time a function is called, a new frame is added to the top of the stack, | ||
containing information such as local variables, return addresses and function | ||
parameters. When the function execution completes, the frame is removed from the | ||
stack. This concept allows the program to return to the calling function and | ||
continue executing from where it left off. | ||
|
||
The program counter keeps track of the current or next instruction to | ||
be executed, while the call stack maintains the history of function calls, | ||
return addresses, and local variables. Together, they allow a program to execute | ||
complex control flow structures, handle function calls and return to the | ||
appropriate location in the code after a function completes. | ||
|
||
The call stack in GPU programming has extra complexity due to parallel execution | ||
also NVIDIA and AMD GPUs are using different approaches. NVIDIA GPUs have the | ||
independent thread scheduling feature since Volta GPU, where every thread has | ||
its own call stack and effective program counter. At AMD GPUs every | ||
warp/wavefront has its own call stack and program counter. | ||
|
||
The call stack for each thread must track its function calls, local variables, | ||
and return addresses. | ||
|
||
If a thread or warp exceeds its stack size, a stack overflow occurs, causing | ||
kernel failure. This can be detected and handled using debuggers or with the | ||
:doc:`error handling functions <../how-to/hip_runtime_api/error_handling>`. | ||
|
||
Call stack management with HIP | ||
================================================================================ | ||
|
||
Developers can adjust the call stack size for every thread, allowing fine-tuning | ||
based on specific kernel requirements. This helps prevent stack overflow errors | ||
by ensuring sufficient stack memory is allocated. | ||
|
||
.. code-block:: cpp | ||
#include <hip/hip_runtime.h> | ||
#include <iostream> | ||
#define HIP_CHECK(expression) \ | ||
{ \ | ||
const hipError_t status = expression; \ | ||
if(status != hipSuccess){ \ | ||
std::cerr << "HIP error " \ | ||
<< status << ": " \ | ||
<< hipGetErrorString(status) \ | ||
<< " at " << __FILE__ << ":" \ | ||
<< __LINE__ << std::endl; \ | ||
} \ | ||
} | ||
int main() | ||
{ | ||
size_t stackSize; | ||
HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize)); | ||
std::cout << "Default stack size: " << stackSize << " bytes" << std::endl; | ||
// Set a new stack size | ||
size_t newStackSize = 1024 * 8; // 1MB | ||
HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize)); | ||
std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl; | ||
HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize)); | ||
std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl; | ||
return 0; | ||
} | ||
Handling recursion and deep function calls | ||
-------------------------------------------------------------------------------- | ||
|
||
Similar to CPU programming, recursive functions and deeply nested function calls | ||
are supported. However, developers must ensure that these functions do not | ||
exceed the available stack memory, considering the limited resources on GPUs. | ||
|
||
.. code-block:: cpp | ||
#include <hip/hip_runtime.h> | ||
#include <iostream> | ||
#define HIP_CHECK(expression) \ | ||
{ \ | ||
const hipError_t status = expression; \ | ||
if(status != hipSuccess){ \ | ||
std::cerr << "HIP error " \ | ||
<< status << ": " \ | ||
<< hipGetErrorString(status) \ | ||
<< " at " << __FILE__ << ":" \ | ||
<< __LINE__ << std::endl; \ | ||
} \ | ||
} | ||
__device__ unsigned long long factorial(unsigned long long n) | ||
{ | ||
if (n == 0 || n == 1) { | ||
return 1; | ||
} | ||
return n * factorial(n - 1); | ||
} | ||
__global__ void kernel(unsigned long long n) | ||
{ | ||
unsigned long long result = factorial(n); | ||
const size_t x = threadIdx.x + blockDim.x * blockIdx.x; | ||
if( x == 0) | ||
printf("%llu! = %llu \n", n, result); | ||
} | ||
int main() | ||
{ | ||
kernel<<<1, 1>>>(10); | ||
HIP_CHECK(hipDeviceSynchronize()); | ||
// With -O0 optimization option hit the stack limit | ||
// kernel<<<1, 256>>>(2048); | ||
// HIP_CHECK(hipDeviceSynchronize()); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters