Skip to content

Commit

Permalink
Merge pull request #51 from jipolanco/docs
Browse files Browse the repository at this point in the history
This moves the examples previously in index.md to its own separate page.
  • Loading branch information
jipolanco authored Dec 11, 2024
2 parents 6d435c7 + 270e93f commit 5272ffe
Show file tree
Hide file tree
Showing 6 changed files with 257 additions and 281 deletions.
56 changes: 4 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Yet another package for computing multidimensional [non-uniform fast Fourier tra
Like other [existing packages](#differences-with-other-packages), computation of NUFFTs on CPU
are parallelised using threads.
Transforms can also be performed on GPUs.
In principle all kinds of GPU for which
In principle all GPU platforms for which
a [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl)
backend exists are supported.

Expand Down Expand Up @@ -142,15 +142,15 @@ The only differences are:
- we copy input arrays to the GPU before calling any NUFFT-related functions (`set_points!`, `exec_type1!`, `exec_type2!`)

The example is for an Nvidia GPU (using [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)), but should also work with e.g. [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
by simply choosing `backend = ROCBackend()`.
on an AMD GPU by simply choosing `backend = ROCBackend()`.

```julia
using NonuniformFFTs
using StaticArrays: SVector # for convenience
using CUDA
using Adapt: adapt # optional (see below)

backend = CUDABackend() # other options are CPU() or ROCBackend() (untested)
backend = CUDABackend() # other options are CPU() or ROCBackend()

Ns = (256, 256) # number of Fourier modes in each direction
Np = 1000 # number of non-uniform points
Expand Down Expand Up @@ -232,55 +232,7 @@ For real-to-complex transforms, the NonuniformFFTs.jl API demonstrated above sho

<br>

## Differences with other packages

This package roughly follows the same notation and conventions of the [FINUFFT library](https://finufft.readthedocs.io/en/latest/)
and its [Julia interface](https://github.com/ludvigak/FINUFFT.jl), with a few differences detailed below.

### Conventions used by this package

We try to preserve as much as possible the conventions used in FFTW3.
In particular, this means that:

- The FFT outputs are ordered starting from mode $k = 0$ to $k = N/2 - 1$ (for even $N$) and then from $-N/2$ to $-1$.
Wavenumbers can be obtained in this order by calling `AbstractFFTs.fftfreq(N, N)`.
Use `AbstractFFTs.fftshift` to get Fourier modes in increasing order $-N/2, …, -1, 0, 1, …, N/2 - 1$.
In FINUFFT, one should set [`modeord = 1`](https://finufft.readthedocs.io/en/latest/opts.html#data-handling-options) to get this order.

- The type-1 NUFFT (non-uniform to uniform) is defined with a minus sign in the exponential.
This is the same convention as the [forward DFT in FFTW3](http://fftw.org/fftw3_doc/The-1d-Discrete-Fourier-Transform-_0028DFT_0029.html).
In particular, this means that performing a type-1 NUFFT on uniform points gives the same output than performing a FFT using FFTW3.
In FINUFFT, this corresponds to setting [`iflag = -1`](https://ludvigak.github.io/FINUFFT.jl/latest/#FINUFFT.finufft_makeplan-Tuple{Integer,%20Union{Integer,%20Array{Int64}},%20Integer,%20Integer,%20Real}) in type-1 transforms.
Conversely, type-2 NUFFTs (uniform to non-uniform) are defined with a plus sign, equivalently to the backward DFT in FFTW3.

For compatibility with other packages such as [NFFT.jl](https://github.com/JuliaMath/NFFT.jl), these conventions are *not*
applied when the AbstractNFFTs.jl interface is used (see example above).
In this specific case, modes are assumed to be ordered in increasing order, and
the opposite sign convention is used for Fourier transforms.

### Differences with [NFFT.jl](https://github.com/JuliaMath/NFFT.jl)

- This package allows NUFFTs of purely real non-uniform data.

- Different convention is used: non-uniform points are expected to be in $[0, 2π]$.

### Differences with FINUFFT / FINUFFT.jl

- This package is written in "pure" Julia (besides the FFTs themselves which rely on the FFTW3 library, via their Julia interface).

- This package provides a generic and efficient GPU implementation thanks to
[KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl)
meaning that many kinds of GPUs are supported, including not only Nvidia GPUs but
also AMD ones and possibly more.

- This package allows NUFFTs of purely real non-uniform data.
Moreover, transforms can be performed on arbitrary dimensions.

- A different smoothing kernel function is used (backwards Kaiser–Bessel kernel by default on CPUs; Kaiser–Bessel kernel on GPUs).

- It is possible to use the same plan for type-1 and type-2 transforms, reducing memory requirements in cases where one wants to perform both.

## Performance benchmarks
## Performance

NonuniformFFTs.jl can be *fast*:

Expand Down
2 changes: 2 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
[deps]
AbstractFFTs = "621f4979-c628-5d54-868e-fcf4e3e8185c"
AbstractNFFTs = "7f219486-4aa7-41d6-80a7-e08ef20ceed7"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterCitations = "daee34ce-89f3-4625-b898-19384cb65244"
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
NonuniformFFTs = "cd96f58b-6017-4a02-bb9e-f4d81626177f"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
using Documenter
using Documenter.HTMLWriter: HTMLAsset
using DocumenterCitations
using Downloads: Downloads
using NonuniformFFTs
Expand Down Expand Up @@ -44,6 +43,7 @@ makedocs(;
modules = [NonuniformFFTs],
pages = [
"index.md",
"examples.md",
"accuracy.md",
"benchmarks.md",
"API.md",
Expand Down
9 changes: 4 additions & 5 deletions docs/src/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ The default one (**filled circles**) corresponds to setting
`gpu_method = :global_memory` in [`PlanNUFFT`](@ref).
This method is slightly faster than CuFINUFFT at low point densities, but
slightly slower at large ones.
However, at large densities it is actually faster to use the non-default
`gpu_method = :shared_memory` option (**open circles**, labelled "SM" in the figures).

In fact, at large densities it actually faster to use the non-default
`gpu_method = :shared_memory` option (**open circles**, labelled "SM" in the figures).
The `:shared_memory` method performs some operations on GPU [shared
memory](https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/) (also called [local data share](https://rocm.docs.amd.com/projects/HIP/en/latest/understand/hardware_implementation.html#local-data-share)), which is small but much faster than the GPU's global memory.
During spreading (type-1 transforms), this approach allows to reduce the number
Expand All @@ -56,13 +56,12 @@ a few differences.
In particular, we completely avoid atomic operations on shared memory, which
seems to speed up things quite a bit and might explain the important gains
with respect to the CuFINUFFT implementation.[^1]
Another difference is that we also provide a shared-memory implementation of type-2 transforms
We also provide a shared-memory implementation of type-2 transforms
(interpolation).
As seen [below](@ref benchmarks-complex-type2), this can enable some minor gains
at large point densities.

[^1]: In fact the CuFINUFFT shared-memory implementation is surprisingly slow
for the considered problem. It might perform better for two-dimensional or low-accuracy problems.
[^1]: The CuFINUFFT shared-memory implementation might perform better (relative to the global-memory method) for two-dimensional or low-accuracy problems.

### Type-1 transforms

Expand Down
243 changes: 243 additions & 0 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# Examples

## Basic usage

The first two examples illustrate how to perform type-1 and type-2 NUFFTs in one dimension.
In these examples the non-uniform data is real-valued, but it could be easily made
complex by replacing `Float64` with `ComplexF64`.

### Type-1 (or *adjoint*) NUFFT in one dimension

```@example
using NonuniformFFTs
using AbstractFFTs: rfftfreq # can be used to obtain the associated Fourier wavenumbers
N = 256 # number of Fourier modes
Np = 100 # number of non-uniform points
ks = rfftfreq(N, N) # Fourier wavenumbers
# Generate some non-uniform random data
T = Float64 # non-uniform data is real (can also be complex)
xp = rand(T, Np) .* 2π # non-uniform points in [0, 2π]
vp = randn(T, Np) # random values at points
# Create plan for data of type T
plan_nufft = PlanNUFFT(T, N; m = HalfSupport(4)) # larger support increases accuracy
# Set non-uniform points
set_points!(plan_nufft, xp)
# Perform type-1 NUFFT on preallocated output
ûs = Array{Complex{T}}(undef, size(plan_nufft))
exec_type1!(ûs, plan_nufft, vp)
nothing # hide
```

### Type-2 (or *direct*) NUFFT in one dimension

```@example
using NonuniformFFTs
N = 256 # number of Fourier modes
Np = 100 # number of non-uniform points
# Generate some uniform random data
T = Float64 # non-uniform data is real (can also be complex)
xp = rand(T, Np) .* 2π # non-uniform points in [0, 2π]
ûs = randn(Complex{T}, N ÷ 2 + 1) # random values at points (we need to store roughly half the Fourier modes for complex-to-real transform)
# Create plan for data of type T
plan_nufft = PlanNUFFT(T, N; m = HalfSupport(4))
# Set non-uniform points
set_points!(plan_nufft, xp)
# Perform type-2 NUFFT on preallocated output
vp = Array{T}(undef, Np)
exec_type2!(vp, plan_nufft, ûs)
nothing # hide
```

## Multidimensional transforms

To perform a ``d``-dimensional transform, one simply needs to pass a tuple of
dimensions `(Nx, Ny, …)` to `PlanNUFFT`.
Moreover, the vector of ``d``-dimensional positions can either be specified as a vector of tuples `(xᵢ, yᵢ, …)`
or, similarly, as a vector of `SVector`s:

```@example
using NonuniformFFTs
using StaticArrays: SVector # for convenience
Ns = (256, 256) # number of Fourier modes in each direction
Np = 1000 # number of non-uniform points
# Generate some non-uniform random data
T = Float64 # non-uniform data is real (can also be complex)
d = length(Ns) # number of dimensions (d = 2 here)
xp = [2π * rand(SVector{d, T}) for _ ∈ 1:Np] # non-uniform points in [0, 2π]ᵈ
vp = randn(T, Np) # random values at points
# Create plan for data of type T
plan_nufft = PlanNUFFT(T, Ns; m = HalfSupport(4))
# Set non-uniform points
set_points!(plan_nufft, xp)
# Perform type-1 NUFFT on preallocated output
ûs = Array{Complex{T}}(undef, size(plan_nufft))
exec_type1!(ûs, plan_nufft, vp)
# Perform type-2 NUFFT on preallocated output
exec_type2!(vp, plan_nufft, ûs)
nothing # hide
```

## Multiple transforms on the same non-uniform points

One may want to perform multiple transforms with the same non-uniform points.
For example, this can be useful for dealing with vector quantities (as opposed to scalar ones).
To do this, one should pass `ntransforms = Val(Nt)` where `Nt` is the number of transforms to perform.
Moreover, the input and output data should be tuples of arrays of length `Nt`, as shown below.

```@example
using NonuniformFFTs
N = 256 # number of Fourier modes
Np = 100 # number of non-uniform points
ntrans = Val(3) # number of simultaneous transforms
# Generate some non-uniform random data
T = Float64 # non-uniform data is real (can also be complex)
xp = rand(T, Np) .* 2π # non-uniform points in [0, 2π]
vp = ntuple(_ -> randn(T, Np), ntrans) # random values at points (this is a tuple of 3 arrays)
# Create plan for data of type T
plan_nufft = PlanNUFFT(T, N; ntransforms = ntrans)
# Set non-uniform points
set_points!(plan_nufft, xp)
# Perform type-1 NUFFT on preallocated output (one array per transformed quantity)
ûs = ntuple(_ -> Array{Complex{T}}(undef, size(plan_nufft)), ntrans) # this is a tuple of 3 arrays
exec_type1!(ûs, plan_nufft, vp)
# Perform type-2 NUFFT on preallocated output (one vector per transformed quantity)
vp_interp = map(similar, vp) # this is a tuple of 3 arrays
exec_type2!(vp, plan_nufft, ûs)
nothing # hide
```

## GPU usage

Below is a GPU version of the multidimensional transform example above.
The only differences are:

- we import [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)
- we import [Adapt.jl](https://github.com/JuliaGPU/Adapt.jl) (optional but convenient)
- we pass `backend = CUDABackend()` to `PlanNUFFT` (`CUDABackend` is a [KernelAbstractions backend](https://juliagpu.github.io/KernelAbstractions.jl/stable/#Supported-backends) and is exported by CUDA.jl).
The default is `backend = CPU()`.
- we copy input arrays to the GPU before calling any NUFFT-related functions ([`set_points!`](@ref), [`exec_type1!`](@ref), [`exec_type2!`](@ref))

The example is for an Nvidia GPU (using [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)), but should also work with e.g. [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
on an AMD GPU by simply choosing `backend = ROCBackend()`.

```julia
using NonuniformFFTs
using StaticArrays: SVector # for convenience
using CUDA
using Adapt: adapt # optional (see below)

backend = CUDABackend() # other options are CPU() or ROCBackend()

Ns = (256, 256) # number of Fourier modes in each direction
Np = 1000 # number of non-uniform points

# Generate some non-uniform random data
T = Float64 # non-uniform data is real (can also be complex)
d = length(Ns) # number of dimensions (d = 2 here)
xp_cpu = [T(2π) * rand(SVector{d, T}) for _ 1:Np] # non-uniform points in [0, 2π]ᵈ
vp_cpu = randn(T, Np) # random values at points

# Copy data to the GPU (using Adapt is optional but it makes code more generic).
# Note that all data needs to be on the GPU before setting points or executing transforms.
# We could have also generated the data directly on the GPU.
xp = adapt(backend, xp_cpu) # returns a CuArray if backend = CUDABackend
vp = adapt(backend, vp_cpu)

# Create plan for data of type T
plan_nufft = PlanNUFFT(T, Ns; m = HalfSupport(4), backend)

# Set non-uniform points
set_points!(plan_nufft, xp)

# Perform type-1 NUFFT on preallocated output
ûs = similar(vp, Complex{T}, size(plan_nufft)) # initialises a GPU array for the output
exec_type1!(ûs, plan_nufft, vp)

# Perform type-2 NUFFT on preallocated output
exec_type2!(vp, plan_nufft, ûs)
```

## [AbstractNFFTs.jl interface](@id AbstractNFFTs-interface)

This package also implements the [AbstractNFFTs.jl](https://juliamath.github.io/NFFT.jl/stable/abstract/)
interface as an alternative API for constructing plans and evaluating transforms.
This can be useful for comparing with similar packages such as [NFFT.jl](https://github.com/JuliaMath/NFFT.jl).

For this, a [`NFFTPlan`](@ref NonuniformFFTs.NFFTPlan) constructor (alternative to [`PlanNUFFT`](@ref)) is provided which
supports most of the parameters supported by [NFFT.jl](https://github.com/JuliaMath/NFFT.jl).
Alternatively, once NonuniformFFTs.jl has been loaded, the [`plan_nfft`](https://juliamath.github.io/NFFT.jl/v0.13.5/api/#AbstractNFFTs.plan_nfft-Union{Tuple{D},%20Tuple{T},%20Tuple{Type{%3C:Array},%20Matrix{T},%20Tuple{Vararg{Int64,%20D}},%20Vararg{Any}}}%20where%20{T,%20D})
function from AbstractNFFTs.jl also generates a `NFFTPlan`.
For compatibility with NFFT.jl, the plan generated via this interface **does not
follow the same [conventions](@ref nufft-conventions)** followed by `PlanNUFFT` plans.

The main differences are:

- points are assumed to be in ``[-1/2, 1/2)`` instead of ``[0, 2π)``;
- the opposite Fourier sign convention is used (e.g. ``e^{-i k x_j}`` becomes ``e^{+2π i k x_j}``);
- uniform data is in increasing order, with frequencies ``k = -N/2, …, -1, 0,
1, …, N/2-1``, as opposed to preserving the order used by FFTW (which starts at ``k = 0``);
- points locations must be specified as a matrix of dimensions `(d, Np)`.

### Example usage

```@example
using NonuniformFFTs
using AbstractNFFTs: AbstractNFFTs, plan_nfft
using LinearAlgebra: mul!
Ns = (256, 256) # number of Fourier modes in each direction
Np = 1000 # number of non-uniform points
# Generate some non-uniform random data
T = Float64 # must be a real data type (Float32, Float64)
d = length(Ns) # number of dimensions (d = 2 here)
xp = rand(T, (d, Np)) .- T(0.5) # non-uniform points in [-1/2, 1/2)ᵈ; must be given as a (d, Np) matrix
vp = randn(Complex{T}, Np) # random values at points (must be complex)
# Create plan for data of type Complex{T}. Note that we pass the points `xp` as
# a first argument, which calls an AbstractNFFTs-compatible constructor.
p = NonuniformFFTs.NFFTPlan(xp, Ns)
# p = plan_nfft(xp, Ns) # this is also possible
# Getting the expected dimensions of input and output data.
AbstractNFFTs.size_in(p) # (256, 256)
AbstractNFFTs.size_out(p) # (1000,)
# Perform adjoint NFFT, a.k.a. type-1 NUFFT (non-uniform to uniform)
us = adjoint(p) * vp # allocates output array `us`
mul!(us, adjoint(p), vp) # uses preallocated output array `us`
# Perform forward NFFT, a.k.a. type-2 NUFFT (uniform to non-uniform)
wp = p * us
mul!(wp, p, us)
# Setting a different set of non-uniform points
AbstractNFFTs.nodes!(p, xp)
nothing # hide
```

Note: the AbstractNFFTs.jl interface currently only supports complex-valued non-uniform data.
For real-to-complex transforms, the standard NonuniformFFTs.jl API demonstrated [above](#Basic-usage) (based on [`PlanNUFFT`](@ref)) should be used instead.

Loading

0 comments on commit 5272ffe

Please sign in to comment.