diff --git a/README.md b/README.md
index 7822b66..6a95786 100644
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@ Yet another package for computing multidimensional [non-uniform fast Fourier tra
 Like other [existing packages](#differences-with-other-packages), computation of NUFFTs on CPU
 are parallelised using threads.
 Transforms can also be performed on GPUs.
-In principle all kinds of GPU for which
+In principle all GPU platforms for which
 a [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl)
 backend exists are supported.
 
@@ -142,7 +142,7 @@ The only differences are:
 - we copy input arrays to the GPU before calling any NUFFT-related functions (`set_points!`, `exec_type1!`, `exec_type2!`)
 
 The example is for an Nvidia GPU (using [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)), but should also work with e.g. [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
-by simply choosing `backend = ROCBackend()`.
+on an AMD GPU by simply choosing `backend = ROCBackend()`.
 
 ```julia
 using NonuniformFFTs
@@ -150,7 +150,7 @@ using StaticArrays: SVector  # for convenience
 using CUDA
 using Adapt: adapt  # optional (see below)
 
-backend = CUDABackend()  # other options are CPU() or ROCBackend() (untested)
+backend = CUDABackend()  # other options are CPU() or ROCBackend()
 
 Ns = (256, 256)  # number of Fourier modes in each direction
 Np = 1000        # number of non-uniform points
@@ -232,55 +232,7 @@ For real-to-complex transforms, the NonuniformFFTs.jl API demonstrated above sho
 
 <br>
 
-## Differences with other packages
-
-This package roughly follows the same notation and conventions of the [FINUFFT library](https://finufft.readthedocs.io/en/latest/)
-and its [Julia interface](https://github.com/ludvigak/FINUFFT.jl), with a few differences detailed below.
-
-### Conventions used by this package
-
-We try to preserve as much as possible the conventions used in FFTW3.
-In particular, this means that:
-
-- The FFT outputs are ordered starting from mode $k = 0$ to $k = N/2 - 1$ (for even $N$) and then from $-N/2$ to $-1$.
-  Wavenumbers can be obtained in this order by calling `AbstractFFTs.fftfreq(N, N)`.
-  Use `AbstractFFTs.fftshift` to get Fourier modes in increasing order $-N/2, …, -1, 0, 1, …, N/2 - 1$.
-  In FINUFFT, one should set [`modeord = 1`](https://finufft.readthedocs.io/en/latest/opts.html#data-handling-options) to get this order.
-
-- The type-1 NUFFT (non-uniform to uniform) is defined with a minus sign in the exponential.
-  This is the same convention as the [forward DFT in FFTW3](http://fftw.org/fftw3_doc/The-1d-Discrete-Fourier-Transform-_0028DFT_0029.html).
-  In particular, this means that performing a type-1 NUFFT on uniform points gives the same output than performing a FFT using FFTW3.
-  In FINUFFT, this corresponds to setting [`iflag = -1`](https://ludvigak.github.io/FINUFFT.jl/latest/#FINUFFT.finufft_makeplan-Tuple{Integer,%20Union{Integer,%20Array{Int64}},%20Integer,%20Integer,%20Real}) in type-1 transforms.
-  Conversely, type-2 NUFFTs (uniform to non-uniform) are defined with a plus sign, equivalently to the backward DFT in FFTW3.
-
-For compatibility with other packages such as [NFFT.jl](https://github.com/JuliaMath/NFFT.jl), these conventions are *not*
-applied when the AbstractNFFTs.jl interface is used (see example above).
-In this specific case, modes are assumed to be ordered in increasing order, and
-the opposite sign convention is used for Fourier transforms.
-
-### Differences with [NFFT.jl](https://github.com/JuliaMath/NFFT.jl)
-
-- This package allows NUFFTs of purely real non-uniform data.
-
-- Different convention is used: non-uniform points are expected to be in $[0, 2π]$.
-
-### Differences with FINUFFT / FINUFFT.jl
-
-- This package is written in "pure" Julia (besides the FFTs themselves which rely on the FFTW3 library, via their Julia interface).
-
-- This package provides a generic and efficient GPU implementation thanks to
-  [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl)
-  meaning that many kinds of GPUs are supported, including not only Nvidia GPUs but
-  also AMD ones and possibly more.
-
-- This package allows NUFFTs of purely real non-uniform data.
-  Moreover, transforms can be performed on arbitrary dimensions.
-
-- A different smoothing kernel function is used (backwards Kaiser–Bessel kernel by default on CPUs; Kaiser–Bessel kernel on GPUs).
-
-- It is possible to use the same plan for type-1 and type-2 transforms, reducing memory requirements in cases where one wants to perform both.
-
-## Performance benchmarks
+## Performance
 
 NonuniformFFTs.jl can be *fast*:
 
diff --git a/docs/Project.toml b/docs/Project.toml
index 6545b99..ce0f09c 100644
--- a/docs/Project.toml
+++ b/docs/Project.toml
@@ -1,8 +1,10 @@
 [deps]
 AbstractFFTs = "621f4979-c628-5d54-868e-fcf4e3e8185c"
+AbstractNFFTs = "7f219486-4aa7-41d6-80a7-e08ef20ceed7"
 CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 DocumenterCitations = "daee34ce-89f3-4625-b898-19384cb65244"
 Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
 NonuniformFFTs = "cd96f58b-6017-4a02-bb9e-f4d81626177f"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
diff --git a/docs/make.jl b/docs/make.jl
index f8a6691..ee06d1d 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -1,5 +1,4 @@
 using Documenter
-using Documenter.HTMLWriter: HTMLAsset
 using DocumenterCitations
 using Downloads: Downloads
 using NonuniformFFTs
@@ -44,6 +43,7 @@ makedocs(;
     modules = [NonuniformFFTs],
     pages = [
         "index.md",
+        "examples.md",
         "accuracy.md",
         "benchmarks.md",
         "API.md",
diff --git a/docs/src/benchmarks.md b/docs/src/benchmarks.md
index d9c59c2..9a98584 100644
--- a/docs/src/benchmarks.md
+++ b/docs/src/benchmarks.md
@@ -44,9 +44,9 @@ The default one (**filled circles**) corresponds to setting
 `gpu_method = :global_memory` in [`PlanNUFFT`](@ref).
 This method is slightly faster than CuFINUFFT at low point densities, but
 slightly slower at large ones.
-However, at large densities it is actually faster to use the non-default
-`gpu_method = :shared_memory` option (**open circles**, labelled "SM" in the figures).
 
+In fact, at large densities it actually faster to use the non-default
+`gpu_method = :shared_memory` option (**open circles**, labelled "SM" in the figures).
 The `:shared_memory` method performs some operations on GPU [shared
 memory](https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/) (also called [local data share](https://rocm.docs.amd.com/projects/HIP/en/latest/understand/hardware_implementation.html#local-data-share)), which is small but much faster than the GPU's global memory.
 During spreading (type-1 transforms), this approach allows to reduce the number
@@ -56,13 +56,12 @@ a few differences.
 In particular, we completely avoid atomic operations on shared memory, which
 seems to speed up things quite a bit and might explain the important gains
 with respect to the CuFINUFFT implementation.[^1]
-Another difference is that we also provide a shared-memory implementation of type-2 transforms
+We also provide a shared-memory implementation of type-2 transforms
 (interpolation).
 As seen [below](@ref benchmarks-complex-type2), this can enable some minor gains
 at large point densities.
 
-[^1]: In fact the CuFINUFFT shared-memory implementation is surprisingly slow
-    for the considered problem. It might perform better for two-dimensional or low-accuracy problems.
+[^1]: The CuFINUFFT shared-memory implementation might perform better (relative to the global-memory method) for two-dimensional or low-accuracy problems.
 
 ### Type-1 transforms
 
diff --git a/docs/src/examples.md b/docs/src/examples.md
new file mode 100644
index 0000000..fa04562
--- /dev/null
+++ b/docs/src/examples.md
@@ -0,0 +1,243 @@
+# Examples
+
+## Basic usage
+
+The first two examples illustrate how to perform type-1 and type-2 NUFFTs in one dimension.
+In these examples the non-uniform data is real-valued, but it could be easily made
+complex by replacing `Float64` with `ComplexF64`.
+
+### Type-1 (or *adjoint*) NUFFT in one dimension
+
+```@example
+using NonuniformFFTs
+using AbstractFFTs: rfftfreq  # can be used to obtain the associated Fourier wavenumbers
+
+N = 256   # number of Fourier modes
+Np = 100  # number of non-uniform points
+ks = rfftfreq(N, N)  # Fourier wavenumbers
+
+# Generate some non-uniform random data
+T = Float64             # non-uniform data is real (can also be complex)
+xp = rand(T, Np) .* 2π  # non-uniform points in [0, 2π]
+vp = randn(T, Np)       # random values at points
+
+# Create plan for data of type T
+plan_nufft = PlanNUFFT(T, N; m = HalfSupport(4))  # larger support increases accuracy
+
+# Set non-uniform points
+set_points!(plan_nufft, xp)
+
+# Perform type-1 NUFFT on preallocated output
+ûs = Array{Complex{T}}(undef, size(plan_nufft))
+exec_type1!(ûs, plan_nufft, vp)
+nothing  # hide
+```
+
+### Type-2 (or *direct*) NUFFT in one dimension
+
+```@example
+using NonuniformFFTs
+
+N = 256   # number of Fourier modes
+Np = 100  # number of non-uniform points
+
+# Generate some uniform random data
+T = Float64                        # non-uniform data is real (can also be complex)
+xp = rand(T, Np) .* 2π             # non-uniform points in [0, 2π]
+ûs = randn(Complex{T}, N ÷ 2 + 1)  # random values at points (we need to store roughly half the Fourier modes for complex-to-real transform)
+
+# Create plan for data of type T
+plan_nufft = PlanNUFFT(T, N; m = HalfSupport(4))
+
+# Set non-uniform points
+set_points!(plan_nufft, xp)
+
+# Perform type-2 NUFFT on preallocated output
+vp = Array{T}(undef, Np)
+exec_type2!(vp, plan_nufft, ûs)
+nothing  # hide
+```
+
+## Multidimensional transforms
+
+To perform a ``d``-dimensional transform, one simply needs to pass a tuple of
+dimensions `(Nx, Ny, …)` to `PlanNUFFT`.
+Moreover, the vector of ``d``-dimensional positions can either be specified as a vector of tuples `(xᵢ, yᵢ, …)`
+or, similarly, as a vector of `SVector`s:
+
+```@example
+using NonuniformFFTs
+using StaticArrays: SVector  # for convenience
+
+Ns = (256, 256)  # number of Fourier modes in each direction
+Np = 1000        # number of non-uniform points
+
+# Generate some non-uniform random data
+T = Float64                                   # non-uniform data is real (can also be complex)
+d = length(Ns)                                # number of dimensions (d = 2 here)
+xp = [2π * rand(SVector{d, T}) for _ ∈ 1:Np]  # non-uniform points in [0, 2π]ᵈ
+vp = randn(T, Np)                             # random values at points
+
+# Create plan for data of type T
+plan_nufft = PlanNUFFT(T, Ns; m = HalfSupport(4))
+
+# Set non-uniform points
+set_points!(plan_nufft, xp)
+
+# Perform type-1 NUFFT on preallocated output
+ûs = Array{Complex{T}}(undef, size(plan_nufft))
+exec_type1!(ûs, plan_nufft, vp)
+
+# Perform type-2 NUFFT on preallocated output
+exec_type2!(vp, plan_nufft, ûs)
+nothing  # hide
+```
+
+## Multiple transforms on the same non-uniform points
+
+One may want to perform multiple transforms with the same non-uniform points.
+For example, this can be useful for dealing with vector quantities (as opposed to scalar ones).
+To do this, one should pass `ntransforms = Val(Nt)` where `Nt` is the number of transforms to perform.
+Moreover, the input and output data should be tuples of arrays of length `Nt`, as shown below.
+
+```@example
+using NonuniformFFTs
+
+N = 256   # number of Fourier modes
+Np = 100  # number of non-uniform points
+ntrans = Val(3)  # number of simultaneous transforms
+
+# Generate some non-uniform random data
+T = Float64             # non-uniform data is real (can also be complex)
+xp = rand(T, Np) .* 2π  # non-uniform points in [0, 2π]
+vp = ntuple(_ -> randn(T, Np), ntrans)  # random values at points (this is a tuple of 3 arrays)
+
+# Create plan for data of type T
+plan_nufft = PlanNUFFT(T, N; ntransforms = ntrans)
+
+# Set non-uniform points
+set_points!(plan_nufft, xp)
+
+# Perform type-1 NUFFT on preallocated output (one array per transformed quantity)
+ûs = ntuple(_ -> Array{Complex{T}}(undef, size(plan_nufft)), ntrans)  # this is a tuple of 3 arrays
+exec_type1!(ûs, plan_nufft, vp)
+
+# Perform type-2 NUFFT on preallocated output (one vector per transformed quantity)
+vp_interp = map(similar, vp)  # this is a tuple of 3 arrays
+exec_type2!(vp, plan_nufft, ûs)
+nothing  # hide
+```
+
+## GPU usage
+
+Below is a GPU version of the multidimensional transform example above.
+The only differences are:
+
+- we import [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)
+- we import [Adapt.jl](https://github.com/JuliaGPU/Adapt.jl) (optional but convenient)
+- we pass `backend = CUDABackend()` to `PlanNUFFT` (`CUDABackend` is a [KernelAbstractions backend](https://juliagpu.github.io/KernelAbstractions.jl/stable/#Supported-backends) and is exported by CUDA.jl).
+  The default is `backend = CPU()`.
+- we copy input arrays to the GPU before calling any NUFFT-related functions ([`set_points!`](@ref), [`exec_type1!`](@ref), [`exec_type2!`](@ref))
+
+The example is for an Nvidia GPU (using [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)), but should also work with e.g. [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
+on an AMD GPU by simply choosing `backend = ROCBackend()`.
+
+```julia
+using NonuniformFFTs
+using StaticArrays: SVector  # for convenience
+using CUDA
+using Adapt: adapt  # optional (see below)
+
+backend = CUDABackend()  # other options are CPU() or ROCBackend()
+
+Ns = (256, 256)  # number of Fourier modes in each direction
+Np = 1000        # number of non-uniform points
+
+# Generate some non-uniform random data
+T = Float64                                          # non-uniform data is real (can also be complex)
+d = length(Ns)                                       # number of dimensions (d = 2 here)
+xp_cpu = [T(2π) * rand(SVector{d, T}) for _ ∈ 1:Np]  # non-uniform points in [0, 2π]ᵈ
+vp_cpu = randn(T, Np)                                # random values at points
+
+# Copy data to the GPU (using Adapt is optional but it makes code more generic).
+# Note that all data needs to be on the GPU before setting points or executing transforms.
+# We could have also generated the data directly on the GPU.
+xp = adapt(backend, xp_cpu)  # returns a CuArray if backend = CUDABackend
+vp = adapt(backend, vp_cpu)
+
+# Create plan for data of type T
+plan_nufft = PlanNUFFT(T, Ns; m = HalfSupport(4), backend)
+
+# Set non-uniform points
+set_points!(plan_nufft, xp)
+
+# Perform type-1 NUFFT on preallocated output
+ûs = similar(vp, Complex{T}, size(plan_nufft))  # initialises a GPU array for the output
+exec_type1!(ûs, plan_nufft, vp)
+
+# Perform type-2 NUFFT on preallocated output
+exec_type2!(vp, plan_nufft, ûs)
+```
+
+## [AbstractNFFTs.jl interface](@id AbstractNFFTs-interface)
+
+This package also implements the [AbstractNFFTs.jl](https://juliamath.github.io/NFFT.jl/stable/abstract/)
+interface as an alternative API for constructing plans and evaluating transforms.
+This can be useful for comparing with similar packages such as [NFFT.jl](https://github.com/JuliaMath/NFFT.jl).
+
+For this, a [`NFFTPlan`](@ref NonuniformFFTs.NFFTPlan) constructor (alternative to [`PlanNUFFT`](@ref)) is provided which
+supports most of the parameters supported by [NFFT.jl](https://github.com/JuliaMath/NFFT.jl).
+Alternatively, once NonuniformFFTs.jl has been loaded, the [`plan_nfft`](https://juliamath.github.io/NFFT.jl/v0.13.5/api/#AbstractNFFTs.plan_nfft-Union{Tuple{D},%20Tuple{T},%20Tuple{Type{%3C:Array},%20Matrix{T},%20Tuple{Vararg{Int64,%20D}},%20Vararg{Any}}}%20where%20{T,%20D})
+function from AbstractNFFTs.jl also generates a `NFFTPlan`.
+For compatibility with NFFT.jl, the plan generated via this interface **does not
+follow the same [conventions](@ref nufft-conventions)** followed by `PlanNUFFT` plans.
+
+The main differences are:
+
+- points are assumed to be in ``[-1/2, 1/2)`` instead of ``[0, 2π)``;
+- the opposite Fourier sign convention is used (e.g. ``e^{-i k x_j}`` becomes ``e^{+2π i k x_j}``);
+- uniform data is in increasing order, with frequencies ``k = -N/2, …, -1, 0,
+  1, …, N/2-1``, as opposed to preserving the order used by FFTW (which starts at ``k = 0``);
+- points locations must be specified as a matrix of dimensions `(d, Np)`.
+
+### Example usage
+
+```@example
+using NonuniformFFTs
+using AbstractNFFTs: AbstractNFFTs, plan_nfft
+using LinearAlgebra: mul!
+
+Ns = (256, 256)  # number of Fourier modes in each direction
+Np = 1000        # number of non-uniform points
+
+# Generate some non-uniform random data
+T = Float64                      # must be a real data type (Float32, Float64)
+d = length(Ns)                   # number of dimensions (d = 2 here)
+xp = rand(T, (d, Np)) .- T(0.5)  # non-uniform points in [-1/2, 1/2)ᵈ; must be given as a (d, Np) matrix
+vp = randn(Complex{T}, Np)       # random values at points (must be complex)
+
+# Create plan for data of type Complex{T}. Note that we pass the points `xp` as
+# a first argument, which calls an AbstractNFFTs-compatible constructor.
+p = NonuniformFFTs.NFFTPlan(xp, Ns)
+# p = plan_nfft(xp, Ns)  # this is also possible
+
+# Getting the expected dimensions of input and output data.
+AbstractNFFTs.size_in(p)   # (256, 256)
+AbstractNFFTs.size_out(p)  # (1000,)
+
+# Perform adjoint NFFT, a.k.a. type-1 NUFFT (non-uniform to uniform)
+us = adjoint(p) * vp      # allocates output array `us`
+mul!(us, adjoint(p), vp)  # uses preallocated output array `us`
+
+# Perform forward NFFT, a.k.a. type-2 NUFFT (uniform to non-uniform)
+wp = p * us
+mul!(wp, p, us)
+
+# Setting a different set of non-uniform points
+AbstractNFFTs.nodes!(p, xp)
+nothing  # hide
+```
+
+Note: the AbstractNFFTs.jl interface currently only supports complex-valued non-uniform data.
+For real-to-complex transforms, the standard NonuniformFFTs.jl API demonstrated [above](#Basic-usage) (based on [`PlanNUFFT`](@ref)) should be used instead.
+
diff --git a/docs/src/index.md b/docs/src/index.md
index 4b3a8df..8e931e1 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -69,234 +69,14 @@ function from the AbstractFFTs package to conveniently obtain the Fourier
 frequencies in the right order.
 For real data transforms, [`rfftfreq`](https://juliamath.github.io/AbstractFFTs.jl/stable/api/#AbstractFFTs.rfftfreq) should be used instead.
 
-Alternatively, for complex non-uniform data, one can use [`fftshift`](https://juliamath.github.io/AbstractFFTs.jl/stable/api/#AbstractFFTs.fftshift) and
+For complex non-uniform data, one can use [`fftshift`](https://juliamath.github.io/AbstractFFTs.jl/stable/api/#AbstractFFTs.fftshift) and
 [`ifftshift`](https://juliamath.github.io/AbstractFFTs.jl/stable/api/#AbstractFFTs.ifftshift)
 from the same package to switch between this convention and the more
 "natural" convention of storing frequencies in increasing order
 ``(k = -N/2, …, N/2 - 1)``.
 
-## Basic usage
-
-### Type-1 (or *adjoint*) NUFFT in one dimension
-
-```julia
-using NonuniformFFTs
-using AbstractFFTs: rfftfreq  # can be used to obtain the associated Fourier wavenumbers
-
-N = 256   # number of Fourier modes
-Np = 100  # number of non-uniform points
-ks = rfftfreq(N, N)  # Fourier wavenumbers
-
-# Generate some non-uniform random data
-T = Float64             # non-uniform data is real (can also be complex)
-xp = rand(T, Np) .* 2π  # non-uniform points in [0, 2π]
-vp = randn(T, Np)       # random values at points
-
-# Create plan for data of type T
-plan_nufft = PlanNUFFT(T, N; m = HalfSupport(4))  # larger support increases accuracy
-
-# Set non-uniform points
-set_points!(plan_nufft, xp)
-
-# Perform type-1 NUFFT on preallocated output
-ûs = Array{Complex{T}}(undef, size(plan_nufft))
-exec_type1!(ûs, plan_nufft, vp)
-@. ûs = ûs / N  # normalise transform
-```
-
-### Type-2 (or *direct*) NUFFT in one dimension
-
-```julia
-using NonuniformFFTs
-
-N = 256   # number of Fourier modes
-Np = 100  # number of non-uniform points
-
-# Generate some uniform random data
-T = Float64                        # non-uniform data is real (can also be complex)
-xp = rand(T, Np) .* 2π             # non-uniform points in [0, 2π]
-ûs = randn(Complex{T}, N ÷ 2 + 1)  # random values at points (we need to store roughly half the Fourier modes for complex-to-real transform)
-
-# Create plan for data of type T
-plan_nufft = PlanNUFFT(T, N; m = HalfSupport(4))
-
-# Set non-uniform points
-set_points!(plan_nufft, xp)
-
-# Perform type-2 NUFFT on preallocated output
-vp = Array{T}(undef, Np)
-exec_type2!(vp, plan_nufft, ûs)
-```
-
-### Multidimensional transforms
-
-```julia
-using NonuniformFFTs
-using StaticArrays: SVector  # for convenience
-
-Ns = (256, 256)  # number of Fourier modes in each direction
-Np = 1000        # number of non-uniform points
-
-# Generate some non-uniform random data
-T = Float64                                   # non-uniform data is real (can also be complex)
-d = length(Ns)                                # number of dimensions (d = 2 here)
-xp = [2π * rand(SVector{d, T}) for _ ∈ 1:Np]  # non-uniform points in [0, 2π]ᵈ
-vp = randn(T, Np)                             # random values at points
-
-# Create plan for data of type T
-plan_nufft = PlanNUFFT(T, Ns; m = HalfSupport(4))
-
-# Set non-uniform points
-set_points!(plan_nufft, xp)
-
-# Perform type-1 NUFFT on preallocated output
-ûs = Array{Complex{T}}(undef, size(plan_nufft))
-exec_type1!(ûs, plan_nufft, vp)
-ûs ./= prod(Ns)  # normalise transform
-
-# Perform type-2 NUFFT on preallocated output
-exec_type2!(vp, plan_nufft, ûs)
-```
-
-### Multiple transforms on the same non-uniform points
-
-```julia
-using NonuniformFFTs
-
-N = 256   # number of Fourier modes
-Np = 100  # number of non-uniform points
-ntrans = Val(3)  # number of simultaneous transforms
-
-# Generate some non-uniform random data
-T = Float64             # non-uniform data is real (can also be complex)
-xp = rand(T, Np) .* 2π  # non-uniform points in [0, 2π]
-vp = ntuple(_ -> randn(T, Np), ntrans)  # random values at points (one vector per transformed quantity)
-
-# Create plan for data of type T
-plan_nufft = PlanNUFFT(T, N; ntransforms = ntrans)
-
-# Set non-uniform points
-set_points!(plan_nufft, xp)
-
-# Perform type-1 NUFFT on preallocated output (one array per transformed quantity)
-ûs = ntuple(_ -> Array{Complex{T}}(undef, size(plan_nufft)), ntrans)
-exec_type1!(ûs, plan_nufft, vp)
-@. ûs = ûs / N  # normalise transform
-
-# Perform type-2 NUFFT on preallocated output (one vector per transformed quantity)
-vp_interp = map(similar, vp)
-exec_type2!(vp, plan_nufft, ûs)
-```
-
-## GPU usage
-
-Below is a GPU version of the multidimensional transform example above.
-The only differences are:
-
-- we import CUDA.jl and Adapt.jl (optional)
-- we pass `backend = CUDABackend()` to `PlanNUFFT` (`CUDABackend` is a [KernelAbstractions backend](https://juliagpu.github.io/KernelAbstractions.jl/stable/#Supported-backends) and is exported by CUDA.jl).
-  The default is `backend = CPU()`.
-- we copy input arrays to the GPU before calling any NUFFT-related functions (`set_points!`, `exec_type1!`, `exec_type2!`)
-
-The example is for an Nvidia GPU (using [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)), but should also work with e.g. [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl)
-by simply choosing `backend = ROCBackend()`.
-
-```julia
-using NonuniformFFTs
-using StaticArrays: SVector  # for convenience
-using CUDA
-using Adapt: adapt  # optional (see below)
-
-backend = CUDABackend()  # other options are CPU() or ROCBackend() (untested)
-
-Ns = (256, 256)  # number of Fourier modes in each direction
-Np = 1000        # number of non-uniform points
-
-# Generate some non-uniform random data
-T = Float64                                          # non-uniform data is real (can also be complex)
-d = length(Ns)                                       # number of dimensions (d = 2 here)
-xp_cpu = [T(2π) * rand(SVector{d, T}) for _ ∈ 1:Np]  # non-uniform points in [0, 2π]ᵈ
-vp_cpu = randn(T, Np)                                # random values at points
-
-# Copy data to the GPU (using Adapt is optional but it makes code more generic).
-# Note that all data needs to be on the GPU before setting points or executing transforms.
-# We could have also generated the data directly on the GPU.
-xp = adapt(backend, xp_cpu)  # returns a CuArray if backend = CUDABackend
-vp = adapt(backend, vp_cpu)
-
-# Create plan for data of type T
-plan_nufft = PlanNUFFT(T, Ns; m = HalfSupport(4), backend)
-
-# Set non-uniform points
-set_points!(plan_nufft, xp)
-
-# Perform type-1 NUFFT on preallocated output
-ûs = similar(vp, Complex{T}, size(plan_nufft))  # initialises a GPU array for the output
-exec_type1!(ûs, plan_nufft, vp)
-
-# Perform type-2 NUFFT on preallocated output
-exec_type2!(vp, plan_nufft, ûs)
-```
-
-## [AbstractNFFTs.jl interface](@id AbstractNFFTs-interface)
-
-This package also implements the [AbstractNFFTs.jl](https://juliamath.github.io/NFFT.jl/stable/abstract/)
-interface as an alternative API for constructing plans and evaluating transforms.
-This can be useful for comparing with similar packages such as [NFFT.jl](https://github.com/JuliaMath/NFFT.jl).
-
-In particular, a specific [`NFFTPlan`](@ref NonuniformFFTs.NFFTPlan) constructor is provided which
-supports most of the parameters supported by [NFFT.jl](https://github.com/JuliaMath/NFFT.jl).
-Alternatively, once NonuniformFFTs.jl has been loaded, the [`plan_nfft`](https://juliamath.github.io/NFFT.jl/v0.13.5/api/#AbstractNFFTs.plan_nfft-Union{Tuple{D},%20Tuple{T},%20Tuple{Type{%3C:Array},%20Matrix{T},%20Tuple{Vararg{Int64,%20D}},%20Vararg{Any}}}%20where%20{T,%20D})
-function from AbstractNFFTs.jl generates the same type type of plan.
-For compatibility with NFFT.jl, the plan generated via this interface **does not
-follow the same conventions**  described [above](@ref nufft-conventions).
-
-The differences are:
-
-- points are assumed to be in ``[-1/2, 1/2)`` instead of ``[0, 2π)``;
-- the opposite Fourier sign convention is used (e.g. ``e^{-i k x_j}`` becomes ``e^{+2π i k x_j}``);
-- uniform data is in increasing order, with frequencies ``k = -N/2, …, -1, 0,
-  1, …, N/2-1``, as opposed to preserving the order used by FFTW (which starts at ``k = 0``).
-
-### Example usage
-
-```julia
-using NonuniformFFTs
-using AbstractNFFTs: AbstractNFFTs, plan_nfft
-using LinearAlgebra: mul!
-
-Ns = (256, 256)  # number of Fourier modes in each direction
-Np = 1000        # number of non-uniform points
-
-# Generate some non-uniform random data
-T = Float64                      # must be a real data type (Float32, Float64)
-d = length(Ns)                   # number of dimensions (d = 2 here)
-xp = rand(T, (d, Np)) .- T(0.5)  # non-uniform points in [-1/2, 1/2)ᵈ; must be given as a (d, Np) matrix
-vp = randn(Complex{T}, Np)       # random values at points (must be complex)
-
-# Create plan for data of type Complex{T}. Note that we pass the points `xp` as
-# a first argument, which calls an AbstractNFFTs-compatible constructor.
-p = NonuniformFFTs.NFFTPlan(xp, Ns)
-# p = plan_nfft(xp, Ns)  # this is also possible
-
-# Getting the expected dimensions of input and output data.
-AbstractNFFTs.size_in(p)   # (256, 256)
-AbstractNFFTs.size_out(p)  # (1000,)
-
-# Perform adjoint NFFT, a.k.a. type-1 NUFFT (non-uniform to uniform)
-us = adjoint(p) * vp      # allocates output array `us`
-mul!(us, adjoint(p), vp)  # uses preallocated output array `us`
-
-# Perform forward NFFT, a.k.a. type-2 NUFFT (uniform to non-uniform)
-wp = p * us
-mul!(wp, p, us)
-
-# Setting a different set of non-uniform points
-AbstractNFFTs.nodes!(p, xp)
-```
-
-Note: the AbstractNFFTs.jl interface currently only supports complex-valued non-uniform data.
-For real-to-complex transforms, the NonuniformFFTs.jl API demonstrated above should be used instead.
+Alternatively, one can pass `fftshift = true` to the [`PlanNUFFT`](@ref)
+constructor to reorder Fourier modes in increasing order of frequencies ("natural" order).
 
 ## [Differences with other packages](@id similar-packages)