-
Notifications
You must be signed in to change notification settings - Fork 540
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
96cf3a6
commit 8c3e0c6
Showing
3 changed files
with
132 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
.. meta:: | ||
:description: This page lists graph safe ROCm libraries. | ||
:keywords: AMD, ROCm, HIP, hipGRAPH | ||
|
||
******************************************************************************** | ||
Call stack | ||
******************************************************************************** | ||
|
||
The call stack is a concept in computer science, representing the stack data structure that stores information about the active subroutines or functions. Each time a function is called, a new frame is added to the top of the stack, containing information such as local variables, return addresses and function parameters. When the function execution completes, the frame is removed from the stack. This concept allows the program to return to the calling function and continue executing from where it left off. | ||
|
||
GPU programming | ||
=============== | ||
|
||
The call stack in GPU programming has extra complexity due to parallel execution: | ||
|
||
Parallel Execution: Each thread in a GPU can invoke functions, leading to multiple call stacks being managed simultaneously. This requires efficient memory use due to the large number of concurrent calls. | ||
|
||
Memory Limitations: The call stack must manage limited memory. Each thread has a small stack space, and exceeding this can cause stack overflow errors. Effective management of stack memory is crucial, especially when code involves recursive functions or deep nesting. | ||
|
||
Kernel Execution: When a kernel launches, each thread independently executes the kernel function. The call stack for each thread must track its function calls, local variables, and return addresses. | ||
|
||
Stack Overflow: If a thread exceeds its stack size, a stack overflow occurs, causing kernel failure. This can be detected and handled using debuggers or runtime checks. | ||
|
||
Call stack management with AMD HIP | ||
================================== | ||
|
||
Query and set stack size | ||
------------------------ | ||
|
||
Developers can adjust the call stack size for their applications, allowing fine-tuning based on specific kernel requirements. This helps prevent stack overflow errors by ensuring sufficient stack memory is allocated. | ||
|
||
.. code-block:: cpp | ||
#include <hip/hip_runtime.h> | ||
#include <iostream> | ||
int main() | ||
{ | ||
size_t stackSize; | ||
hipDeviceGetLimit(&stackSize, hipLimitStackSize); | ||
std::cout << "Default stack size: " << stackSize << " bytes" << std::endl; | ||
// Set a new stack size | ||
size_t newStackSize = 1024 * 1024; // 1MB | ||
hipDeviceSetLimit(hipLimitStackSize, newStackSize); | ||
std::cout << "New stack size set to: " << newStackSize << " bytes" << std::endl; | ||
hipDeviceGetLimit(&stackSize, hipLimitStackSize); | ||
std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl; | ||
return 0; | ||
} | ||
Handling recursion and deep function calls | ||
------------------------------------------ | ||
Similar to CPU programming, recursive functions and deeply nested function calls are supported. However, developers must ensure that these functions do not exceed the available stack memory, considering the limited resources on GPUs. | ||
|
||
.. code-block:: cpp | ||
#include <hip/hip_runtime.h> | ||
#include <iostream> | ||
__global__ void recursiveFunction(int n) | ||
{ | ||
if (n > 0) | ||
{ | ||
recursiveFunction<<<1, 1>>>(n - 1); | ||
} | ||
} | ||
int main() | ||
{ | ||
recursiveFunction<<<1, 1>>>(10); | ||
hipDeviceSynchronize(); | ||
return 0; | ||
} | ||
Error handling | ||
-------------- | ||
Mechanisms are provided to detect stack overflow errors. When a stack overflow occurs, the kernel call fails, and the error can be managed using debuggers or runtime checks, helping maintain the stability and reliability of applications. | ||
|
||
.. code-block:: cpp | ||
#include <hip/hip_runtime.h> | ||
#include <iostream> | ||
int main() | ||
{ | ||
size_t stackSize; | ||
hipDeviceGetLimit(&stackSize, hipLimitStackSize); | ||
std::cout << "Current stack size: " << stackSize << " bytes" << std::endl; | ||
// Simulate a stack overflow | ||
size_t newStackSize = stackSize / 2; // Reduce stack size | ||
hipDeviceSetLimit(hipLimitStackSize, newStackSize); | ||
// Kernel call to provoke stack overflow | ||
recursiveFunction<<<1, 1>>>(20); | ||
hipError_t err = hipDeviceSynchronize(); | ||
if (err != hipSuccess) | ||
{ | ||
std::cerr << "Kernel call failed with error: " << hipGetErrorString(err) << std::endl; | ||
} | ||
return 0; | ||
} | ||
Portability and performance | ||
--------------------------- | ||
Designed to provide a minimal abstraction layer, HIP minimizes performance overhead and maximizes portability. This allows writing high-performance, cross-platform GPU code without sacrificing efficiency. | ||
|
||
.. code-block:: cpp | ||
#include <hip/hip_runtime.h> | ||
#include <iostream> | ||
__global__ void simpleKernel() | ||
{ | ||
printf("Running on GPU\n"); | ||
} | ||
int main() | ||
{ | ||
simpleKernel<<<1, 1>>>(); | ||
hipDeviceSynchronize(); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters