-
Notifications
You must be signed in to change notification settings - Fork 540
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Documentation: Add hardware capabilities page
Co-authored-by: Matthias Knorr <[email protected]>
- Loading branch information
Showing
4 changed files
with
257 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,249 @@ | ||
.. meta:: | ||
:description: This chapter describes the hardware features of the different hardware architectures. | ||
:keywords: AMD, ROCm, HIP, hardware, hardware features, hardware architectures | ||
|
||
******************************************************************************* | ||
Hardware features | ||
******************************************************************************* | ||
|
||
This page gives an overview of the different hardware architectures and the | ||
features they implement. Hardware features do not imply performance, that | ||
depends on the specifications found in the :doc:`rocm:reference/gpu-arch-specs` | ||
page. | ||
|
||
.. list-table:: | ||
:header-rows: 1 | ||
:name: hardware-features-table | ||
|
||
* | ||
- Hardware feature support | ||
- RDNA1 | ||
- CDNA1 | ||
- RDNA2 | ||
- CDNA2 | ||
- RDNA3 | ||
- CDNA3 | ||
* | ||
- :ref:`atomic functions` on 32-bit integer values in global and shared memory | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- Atomic functions on 64-bit integer values in global and shared memory | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- Atomic addition on 32-bit floating point values in global and shared memory | ||
- ❌ | ||
- ❌ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- Atomic addition on 64-bit floating point values in global memory and shared memory | ||
- ❌ | ||
- ❌ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- :ref:`Warp vote functions <warp_vote_functions>` | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- :ref:`Memory fence instructions <memory_fence_instructions>` | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- :ref:`Synchronization functions <synchronization_functions>` | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- :ref:`Surface functions <surface_object_reference>` | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- :ref:`float16 half precision IEEE-conformant floating-point operations<rocm:precision_support_floating_point_types>` | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- :ref:`bfloat16 16-bit floating-point operations<rocm:precision_support_floating_point_types>` | ||
- ❌ | ||
- ✅ | ||
- ❌ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- Support for :ref:`8-bit floating-point types <rocm:precision_support_floating_point_types>` | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ✅ | ||
* | ||
- Support for :ref:`tensor float32 <rocm:precision_support_floating_point_types>` | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ✅ | ||
* | ||
- Packed math with 16-bit floating point values | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- Packed math with 32-bit floating point values | ||
- ❌ | ||
- ❌ | ||
- ❌ | ||
- ✅ | ||
- ❌ | ||
- ✅ | ||
* | ||
- Matrix Cores | ||
- ❌ | ||
- ✅ | ||
- ❌ | ||
- ✅ | ||
- ❌ | ||
- ✅ | ||
* | ||
- On-Chip Error Correcting Code (ECC) | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
- ✅ | ||
* | ||
- Maximum dimensionality of grid | ||
- 3 | ||
- 3 | ||
- 3 | ||
- 3 | ||
- 3 | ||
- 3 | ||
* | ||
- Maximum x-, y- or z-dimension of a grid | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
* | ||
- Maximum number of threads per grid | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
- :math:`2^{32} - 1` | ||
* | ||
- Maximum x-, y- or z-dimension of a block | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
* | ||
- Maximum number of threads per block | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
- :math:`1024` | ||
* | ||
- Wavefront size | ||
- 32 [1]_ | ||
- 64 | ||
- 32 [1]_ | ||
- 64 | ||
- 32 [1]_ | ||
- 64 | ||
* | ||
- Maximum number of resident blocks per compute unit | ||
- 40 [1]_ | ||
- 32 | ||
- 32 [1]_ | ||
- 32 | ||
- 32 [1]_ | ||
- 32 | ||
* | ||
- Maximum number of resident wavefronts per compute unit | ||
- 40 [1]_ | ||
- 32 | ||
- 32 [1]_ | ||
- 32 | ||
- 32 [1]_ | ||
- 32 | ||
* | ||
- Maximum number of resident threads per compute unit | ||
- 1280 [2]_ | ||
- 2048 | ||
- 1024 [2]_ | ||
- 2048 | ||
- 1024 [2]_ | ||
- 2048 | ||
* | ||
- Maximum number of 32-bit vector registers per thread | ||
- 256 | ||
- 256 (vector) + 256 (matrix) | ||
- 256 | ||
- 256 (vector) + 256 (matrix) | ||
- 256 | ||
- 256 (vector) + 256 (matrix) | ||
* | ||
- Maximum number of 32-bit scalar accumulation registers per thread | ||
- 106 | ||
- 104 | ||
- 106 | ||
- 104 | ||
- 106 | ||
- 104 | ||
|
||
.. [1] RDNA architectures have a configurable wavefront size. The native | ||
wavefront size is 32, but they can run in "CU mode", which has an effective | ||
wavefront size of 64. This affects the number of resident wavefronts and | ||
blocks per compute Unit. | ||
.. [2] RDNA architectures expand the concept of the traditional compute unit | ||
with the so-called work group processor, which effectively includes two | ||
compute units, within which all threads can cooperate. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters