v0.6.0
NonuniformFFTs v0.6.0
Added
-
Add alternative implementation of GPU transforms based on shared-memory arrays. This is disabled by default, and can be enabled by passing
gpu_method = :shared_memory
when creating a plan (default is:global_memory
). -
Add possibility to switch between fast approximation of kernel functions (previously the default and only choice) and direct evaluation (previously not implemented). These correspond to the new
kernel_evalmode
plan creation option. Possible values areFastApproximation()
andDirect()
. The default depends on the actual backend. Currently,FastApproximation()
is used on CPUs andDirect()
on GPUs, where it is sometimes faster. -
The
AbstractNFFTs.plan_nfft
function is now implemented for full compatibility with the AbstractNFFTs.jl interface.
Changed
-
BREAKING: Change default precision of transforms. By default, transforms on
Float64
orComplexF64
now have a relative precision of the order of$10^{-7}$ . This corresponds to settingm = HalfSupport(4)
and oversampling factorσ = 2.0
. Previously, the default wasm = HalfSupport(8)
andσ = 2.0
, corresponding to a relative precision of the order of$10^{-14}$ . -
BREAKING: The
PlanNUFFT
constructor can no longer be used to create plans compatible with AbstractNFFTs.jl / NFFT.jl. Instead, a separate (and unexported)NonuniformFFTs.NFFTPlan
type is now defined which may be used for this purpose. Alternatively, one can now use theAbstractNFFTs.plan_nfft
function. -
On GPUs, we now default to direct evaluation of kernel functions (e.g. Kaiser-Bessel) instead of polynomial approximations, as this seems to be faster and uses far fewer GPU registers.
-
On CUDA and AMDGPU, the default kernel is now
KaiserBesselKernel
instead ofBackwardsKaiserBesselKernel
. The direct evaluation of the KB kernel (based on Bessel functions) seems to be a bit faster than backwards KB, both on CUDA and AMDGPU. Accuracy doesn't change much since both kernels have similar precisions.
Merged pull requests:
- CompatHelper: bump compat for GPUArraysCore to 0.2, (keep existing compat) (#36) (@github-actions[bot])
- Add shared-memory GPU implementations of spreading and interpolation (#37) (@jipolanco)
- Change default accuracy of transforms (#38) (@jipolanco)
- Use direct evaluation of kernel functions on GPU (#39) (@jipolanco)
- Allow choosing the kernel evaluation method (#40) (@jipolanco)
- Automatically determine batch size in shared-memory GPU transforms (#41) (@jipolanco)
- Define
AbstractNFFTs.plan_nfft
and create separate plan type (#42) (@jipolanco)