Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 23, 2024
1 parent d5100c9 commit 0918fc1
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 11 deletions.
26 changes: 15 additions & 11 deletions docs/how-to/hip_rtc.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,15 @@ Kernels can be stored as a text string and can be passed to HIPRTC APIs alongsid

NOTE:

* This library can be used on systems without HIP installed nor AMD GPU driver installed at all (offline compilation). Therefore, it does not depend on any HIP runtime library.
* But it does depend on COMGr. You may try to statically link COMGr into HIPRTC to avoid any ambiguity.
* Developers can decide to bundle this library with their application.
* This library can be used on systems without HIP installed nor AMD GPU driver installed at all (offline compilation). Therefore, it does not depend on any HIP runtime library.
* But it does depend on COMGr. You may try to statically link COMGr into HIPRTC to avoid any ambiguity.
* Developers can decide to bundle this library with their application.

## Example

To use HIPRTC functionality, HIPRTC header needs to be included first.
```#include <hip/hiprtc.h>```


Kernels can be stored in a string:

```cpp
Expand Down Expand Up @@ -45,6 +44,7 @@ hiprtcCreateProgram API also allows you to add headers which can be included in
For online compilation, the compiler pre-defines HIP device API functions, HIP specific types and macros for device compilation, but does not include standard C/C++ headers by default. Users can only include header files provided to hiprtcCreateProgram.

After associating the kernel string with hiprtcProgram, you can now compile this program using:

```cpp
hiprtcCompileProgram(prog, // hiprtcProgram
0, // Number of options
Expand Down Expand Up @@ -227,10 +227,10 @@ int main() {
## HIPRTC specific options

HIPRTC provides a few HIPRTC specific flags
* ```--gpu-architecture``` : This flag can guide the code object generation for a specific gpu arch. Example: ```--gpu-architecture=gfx906:sramecc+:xnack-```, its equivalent to ```--offload-arch```.
* ```--gpu-architecture``` : This flag can guide the code object generation for a specific gpu arch. Example: ```--gpu-architecture=gfx906:sramecc+:xnack-```, its equivalent to ```--offload-arch```.
* This option is compulsory if compilation is done on a system without AMD GPUs supported by HIP runtime.
* Otherwise, HIPRTC will load the hip runtime and gather the current device and its architecture info and use it as option.
* ```-fgpu-rdc``` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs.
* ```-fgpu-rdc``` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs.

### Bitcode

Expand Down Expand Up @@ -320,11 +320,13 @@ hipModuleLoadData(&module, binary);
#### Note
* The compiled binary must be loaded before HIPRTC link instance is destroyed using the hiprtcLinkDestroy API.
* The compiled binary must be loaded before HIPRTC link instance is destroyed using the hiprtcLinkDestroy API.
```cpp
hiprtcLinkDestroy(rtc_link_state);
```
* The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy.

* The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy.

### Input Types

Expand Down Expand Up @@ -392,6 +394,7 @@ HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = 10,
HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID = 11,
HIPRTC_ERROR_INTERNAL_ERROR = 12
```

```cpp
hiprtcResult result;
result = hiprtcCompileProgram(prog, 1, opts);
Expand All @@ -416,9 +419,9 @@ The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this funct

### Note

* The identical name expression string must be provided on a subsequent call to hiprtcGetLoweredName to extract the lowered name.
* The correct sequence of calls is : hiprtcAddNameExpression, hiprtcCompileProgram, hiprtcGetLoweredName, hiprtcDestroyProgram.
* The lowered names must be fetched using hiprtcGetLoweredName only after the HIPRTC program has been compiled, and before it has been destroyed.
* The identical name expression string must be provided on a subsequent call to hiprtcGetLoweredName to extract the lowered name.
* The correct sequence of calls is : hiprtcAddNameExpression, hiprtcCompileProgram, hiprtcGetLoweredName, hiprtcDestroyProgram.
* The lowered names must be fetched using hiprtcGetLoweredName only after the HIPRTC program has been compiled, and before it has been destroyed.

### Example

Expand All @@ -439,6 +442,7 @@ template<typename T>
__global__ void f3(int *result) { *result = sizeof(T); }
)"};
```

hiprtcAddNameExpression is called with various name expressions referring to the address of ```__global__``` functions and ```__device__/__constant__``` variables.

```cpp
Expand Down
6 changes: 6 additions & 0 deletions docs/how-to/programming_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ In order to achieve this fine-grained coherence, many AMD GPUs use a limited cac

HIP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable. By default, the environment variable HIP_HOST_COHERENT is set to 0 in HIP.
The control logic in the current version of HIP is as follows:

* No flags are passed in: the host memory allocation is coherent, the HIP_HOST_COHERENT environment variable is ignored.
* hipHostMallocCoherent=1: The host memory allocation will be coherent, the HIP_HOST_COHERENT environment variable is ignored.
* hipHostMallocMapped=1: The host memory allocation will be coherent, the HIP_HOST_COHERENT environment variable is ignored.
Expand Down Expand Up @@ -101,6 +102,7 @@ else {
. . .
}
```
Please note, the managed memory capability check may not be necessary, but if HMM is not supported, then managed malloc will fall back to using system memory and other managed memory API calls will have undefined behavior.
Note, managed memory management is implemented on Linux, not supported on Windows yet.
Expand Down Expand Up @@ -191,12 +193,16 @@ HIP-Clang supports generating two types of static libraries. The first type of s
In addition, the first type of library contains host objects with device code embedded as fat binaries. It is generated using the flag --emit-static-lib. The second type of library contains relocatable device objects and is generated using ar.
Here is an example to create and use static libraries:
* Type 1 using --emit-static-lib:
```cpp
hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a
gcc test.cpp -L. -lhipOptLibrary -L/path/to/hip/lib -lamdhip64 -o test.out
```
* Type 2 using system ar:
```cpp
hipcc hipDevice.cpp -c -fgpu-rdc -o hipDevice.o
ar rcsD libHipDevice.a hipDevice.o
Expand Down

0 comments on commit 0918fc1

Please sign in to comment.