Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 28, 2024
1 parent 4b94e62 commit d919307
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
2 changes: 2 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ clr
cuBLASLt
cuCtx
cuDNN
dataflow
deallocate
denormal
dll
Expand Down Expand Up @@ -74,6 +75,7 @@ Nsight
overindex
overindexing
oversubscription
pragmas
preconditioners
prefetched
preprocessor
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/reduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ accesses".
A notable exception is when the shared read uniformly evaluates to the same
address across the entire warp/wavefront turning it into a broadcast. A
better change naive implementation is to have not only the activity of
threads form continous ranges but their memory accesses too.
threads form continuous ranges but their memory accesses too.

.. code-block:: diff
Expand Down Expand Up @@ -409,8 +409,8 @@ This compiles to the following binaries:
LLVM unrolls the the loop and compiles to a flat series of ``printf`` invocations
while GCC and MSVC both agree to keep the loop intact, visible from the compare
(``cmp``) and the jump (``jne``, ``jl``) instructions. LLVM codegen is identical to
us having written the unrolled loop manually:
(``cmp``) and the jump (``jne``, ``jl``) instructions. LLVM code generation is
identical to us having written the unrolled loop manually:

.. code-block:: C++

Expand Down Expand Up @@ -697,7 +697,7 @@ elements in shared as warps within out block. Much like we could only launch
kernels at block granularity to begin with, we can only warp reduce with
``WarpSize`` granularity (due to the collective nature of the cross-lane
built-ins), hence we introduce ``read_shared_safe`` to pad overindexing by
reading ``zero_elem`` -ents. Reading from global remains unchanged.
reading ``zero_elem`` -s. Reading from global remains unchanged.

.. code-block:: C++

Expand Down

0 comments on commit d919307

Please sign in to comment.