Thrust 1.16.0 #1616
Replies: 1 comment 2 replies
-
Great pre-release, thank you! Is there a way to pass a stream to |
Beta Was this translation helpful? Give feedback.
-
Great pre-release, thank you! Is there a way to pass a stream to |
Beta Was this translation helpful? Give feedback.
-
Summary
Thrust 1.16.0 provides a new “nosync” hint for the CUDA backend, as well as numerous bugfixes and stability improvements.
New
thrust::cuda::par_nosync
Execution PolicyMost of Thrust’s parallel algorithms are fully synchronous and will block the calling CPU thread until all work is completed. This design avoids many pitfalls associated with asynchronous GPU programming, resulting in simpler and less-error prone usage for new CUDA developers. Unfortunately, this improvement in user experience comes at a performance cost that often frustrates more experienced CUDA programmers.
Prior to this release, the only synchronous-to-asynchronous migration path for existing Thrust codebases involved significant refactoring, replacing calls to
thrust
algorithms with a limited set offuture
-basedthrust::async
algorithms or lower-level CUB kernels. The newthrust::cuda::par_nosync
execution policy provides a new, less-invasive entry point for asynchronous computation.par_nosync
is a hint to the Thrust execution engine that any non-essential internal synchronizations should be skipped and that an explicit synchronization will be performed by the caller before accessing results.While some Thrust algorithms require internal synchronization to safely compute their results, many do not. For example, multiple
thrust::for_each
invocations can be launched without waiting for earlier calls to complete:Thanks to @fkallen for this contribution.
Deprecation Notices
CUDA Dynamic Parallelism Support
A future version of Thrust will remove support for CUDA Dynamic Parallelism (CDP).
This will only affect calls to Thrust algorithms made from CUDA device-side code that currently launches a kernel; such calls will instead execute sequentially on the calling GPU thread instead of launching a device-wide kernel.
Breaking Changes
cub
namespace tothrust::cub
. This has caused issues with ambiguous namespaces for projects that declareusing namespace thrust;
from the global namespace. We recommend against this practice.New Features
thrust::cuda::par_nosync
#1568: Addthrust::cuda::par_nosync
policy. Thanks to @fkallen for this contribution.Enhancements
DeviceMergeSort
API and remove Thrust’s internal implementation.thrust::shuffle
. Thanks to @djns99 for this contribution.CMAKE_INSTALL_INCLUDEDIR
values in Thrust’s CMake install rules. Thanks to @robertmaynard for this contribution.Bug Fixes
icc
builds.min
/max
macros defined inwindows.h
.nvc++
.small
macro defined inwindows.h
.This discussion was created from the release Thrust 1.16.0.
Beta Was this translation helpful? Give feedback.
All reactions