Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 23, 2024
1 parent 4024f84 commit 41e36aa
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 12 deletions.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ HIP releases are typically naming convention for each ROCM release to help diffe
* [HIP Porting Guide](docs/how-to/hip_porting_guide.md)
* [HIP Porting Driver Guide](docs/how-to/hip_porting_driver_api.md)
* [HIP Programming Guide](docs/how-to/programming_manual.md)
* [HIP Logging ](docs/how-to/logging.rst)
* [HIP Logging](docs/how-to/logging.rst)
* [Building HIP From Source](docs/install/build.rst)
* [HIP Debugging ](docs/how-to/debugging.rst)
* [HIP Debugging](docs/how-to/debugging.rst)
* [HIP RTC](docs/how-to/hip_rtc.md)
* [HIP Terminology](docs/reference/terms.md) (including Rosetta Stone of GPU computing terms across CUDA/HIP/OpenCL)
* [HIPIFY](https://github.com/ROCm/HIPIFY/blob/amd-staging/README.md)
Expand Down Expand Up @@ -91,7 +91,6 @@ atomics, and timer functions.
It also specifies additional defines and keywords for function types, address spaces, and optimization controls (See the [HIP Kernel Language](docs/reference/kernel_language.rst) for a full description).
Here's an example of defining a simple 'vector_square' kernel.
```cpp
template <typename T>
__global__ void
Expand Down Expand Up @@ -124,7 +123,7 @@ Thus HIP source code can be compiled to run on either platform. Platform-specif
provides source portability to either platform. HIP provides the _hipcc_ compiler driver which will call the appropriate toolchain depending on the desired platform.


## Examples and Getting Started:
## Examples and Getting Started

* A sample and [blog](https://github.com/ROCm/hip-tests/tree/develop/samples/0_Intro/square) that uses any of [HIPIFY](https://github.com/ROCm/HIPIFY/blob/amd-staging/README.md) tools to convert a simple app from CUDA to HIP:

Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/hip_rtc.md
Original file line number Diff line number Diff line change
Expand Up @@ -506,4 +506,4 @@ HIPRTC follows the below versioning.
## Deprecation notice

* Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library libhiprtc.so/libhiprtc.dll. But on Linux, HIPRTC symbols are also present in libhipamd64.so in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the hipamd64.dll doesn't contain the HIPRTC symbols.
* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with __hip__, e.g. __hip_uint32_t. Applications previously using std::uint32_t or similar types can use __hip_ prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to __hip_internal namespace as implementation details.
* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with ```__hip__```, e.g. ```__hip_uint32_t```. Applications previously using std::uint32_t or similar types can use ```__hip_``` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to ```__hip_internal``` namespace as implementation details.
10 changes: 5 additions & 5 deletions docs/how-to/programming_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ A stronger system-level fence can be specified when the event is created with hi
* hipEventReleaseToSystem : Perform a system-scope release operation when the event is recorded. This will make both Coherent and Non-Coherent host memory visible to other agents in the system, but may involve heavyweight operations such as cache flushing. Coherent memory will typically use lighter-weight in-kernel synchronization mechanisms such as an atomic operation and thus does not need to use hipEventReleaseToSystem.
* hipEventDisableTiming: Events created with this flag will not record profiling data and provide the best performance if used for synchronization.

### Summary and Recommendations:
### Summary and Recommendations

* Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points. This memory allows in-kernel synchronization commands such as threadfence_system to work transparently.
* HIP/ROCm also supports the ability to cache host memory in the GPU using the "Non-Coherent" host memory allocations. This can provide performance benefit, but care must be taken to use the correct synchronization.
Expand Down Expand Up @@ -166,9 +166,9 @@ And users can explicitly use "hipStreamPerThread" as per-thread default stream h
In HIP-Clang, long double type is 80-bit extended precision format for x86_64, which is not supported by AMDGPU. HIP-Clang treats long double type as IEEE double type for AMDGPU. Using long double type in HIP source code will not cause issue as long as data of long double type is not transferred between host and device. However, long double type should not be used as kernel argument type.
## Use of _Float16 Type
## Use of ``_Float16`` Type
If a host function is to be used between clang (or hipcc) and gcc for x86_64, i.e. its definition is compiled by one compiler but the caller is compiled by a different compiler, _Float16 or aggregates containing _Float16 should not be used as function argument or return type. This is due to lack of stable ABI for _Float16 on x86_64. Passing _Float16 or aggregates containing _Float16 between clang and gcc could cause undefined behavior.
If a host function is to be used between clang (or hipcc) and gcc for x86_64, i.e. its definition is compiled by one compiler but the caller is compiled by a different compiler, ``_Float16`` or aggregates containing ``_Float16`` should not be used as function argument or return type. This is due to lack of stable ABI for ``_Float16`` on x86_64. Passing ``_Float16`` or aggregates containing ``_Float16`` between clang and gcc could cause undefined behavior.
## FMA and contractions
Expand All @@ -195,14 +195,14 @@ In addition, the first type of library contains host objects with device code em
Here is an example to create and use static libraries:
* Type 1 using --emit-static-lib:
```cpp
hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a
gcc test.cpp -L. -lhipOptLibrary -L/path/to/hip/lib -lamdhip64 -o test.out
```
* Type 2 using system ar:
```cpp
hipcc hipDevice.cpp -c -fgpu-rdc -o hipDevice.o
ar rcsD libHipDevice.a hipDevice.o
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,6 @@ portable applications for AMD and NVIDIA GPUs from single source code.
Known issues are listed on the [HIP GitHub repository](https://github.com/ROCm/HIP/issues).

To contribute features or functions to the HIP project, refer to [Contributing to HIP](https://github.com/ROCm/HIP/blob/develop/CONTRIBUTING.md).
To contribute to the documentation, refer to {doc}`Contributing to ROCm docs <rocm:contribute/contributing>` page.
To contribute to the documentation, refer to {doc}`Contributing to ROCm docs <rocm:contribute/contributing>` page.

You can find licensing information on the [Licensing](https://rocm.docs.amd.com/en/latest/about/license.html) page.
2 changes: 1 addition & 1 deletion docs/reference/terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,5 @@
|Vector|`float4`|`float4`|`float4`|

## Notes
The indexing functions (starting with `thread-index`) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.

The indexing functions (starting with `thread-index`) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.

0 comments on commit 41e36aa

Please sign in to comment.