Release v0.6.0 · jipolanco/NonuniformFFTs.jl

NonuniformFFTs v0.6.0

Add alternative implementation of GPU transforms based on shared-memory arrays. This is disabled by default, and can be enabled by passing gpu_method = :shared_memory when creating a plan (default is :global_memory).
Add possibility to switch between fast approximation of kernel functions (previously the default and only choice) and direct evaluation (previously not implemented). These correspond to the new kernel_evalmode plan creation option. Possible values are FastApproximation() and Direct(). The default depends on the actual backend. Currently, FastApproximation() is used on CPUs and Direct() on GPUs, where it is sometimes faster.
The AbstractNFFTs.plan_nfft function is now implemented for full compatibility with the AbstractNFFTs.jl interface.

BREAKING: Change default precision of transforms. By default, transforms on Float64 or ComplexF64 now have a relative precision of the order of $10^{-7}$. This corresponds to setting m = HalfSupport(4) and oversampling factor σ = 2.0. Previously, the default was m = HalfSupport(8) and σ = 2.0, corresponding to a relative precision of the order of $10^{-14}$.
BREAKING: The PlanNUFFT constructor can no longer be used to create plans compatible with AbstractNFFTs.jl / NFFT.jl. Instead, a separate (and unexported) NonuniformFFTs.NFFTPlan type is now defined which may be used for this purpose. Alternatively, one can now use the AbstractNFFTs.plan_nfft function.
On GPUs, we now default to direct evaluation of kernel functions (e.g. Kaiser-Bessel) instead of polynomial approximations, as this seems to be faster and uses far fewer GPU registers.
On CUDA and AMDGPU, the default kernel is now KaiserBesselKernel instead of BackwardsKaiserBesselKernel. The direct evaluation of the KB kernel (based on Bessel functions) seems to be a bit faster than backwards KB, both on CUDA and AMDGPU. Accuracy doesn't change much since both kernels have similar precisions.

Merged pull requests:

CompatHelper: bump compat for GPUArraysCore to 0.2, (keep existing compat) (#36) (@github-actions[bot])
Add shared-memory GPU implementations of spreading and interpolation (#37) (@jipolanco)
Change default accuracy of transforms (#38) (@jipolanco)
Use direct evaluation of kernel functions on GPU (#39) (@jipolanco)
Allow choosing the kernel evaluation method (#40) (@jipolanco)
Automatically determine batch size in shared-memory GPU transforms (#41) (@jipolanco)
Define AbstractNFFTs.plan_nfft and create separate plan type (#42) (@jipolanco)