-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
4 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,59 +1,7 @@ | ||
# Benchmarks | ||
|
||
The benchmarks consist in type-1 and type-2 NUFFTs on a uniform 3D grid of | ||
fixed dimensions $M^3 = 256^3$ (excluding oversampling). We vary the number of | ||
non-uniform points $N$, so that the point density $ρ = N / M^3$ takes values | ||
between $10^{-4}$ (very few points) and $10^1$ (very dense). | ||
Points are randomly located in $[0, 2π)^3$ using a uniform distribution. | ||
The relative tolerance is fixed to $10^{-6}$. | ||
In NonuniformFFTs.jl, this can be achieved with the parameters `σ = 1.5` | ||
(oversampling factor) and $m = HalfSupport(4)$ (see [Accuracy](@ref accuracy)). | ||
All tests are run in double precision (`Float64` or `ComplexF64` non-uniform data). | ||
This directory contains scripts for executing benchmarks (`run_benchmarks.jl`) | ||
and generated plotting results (`plots/plot_benchmarks.jl`). | ||
|
||
The tests were run on a cluster with an AMD EPYC 7302 CPU (32 threads) and an | ||
NVIDIA A100 GPU. | ||
|
||
The benchmarks compare NonuniformFFTs.jl v0.6.7 (26/11/2024) and FINUFFT v2.3.1. | ||
|
||
Each reported time includes (1) the time spent processing non-uniform points | ||
(`set_points!` / `(cu)finufft_setpts!`) and (2) the time spent on the actual transform (`exec_type{1,2}!` / `(cu)finufft_exec!`). | ||
|
||
## FINUFFT set-up | ||
|
||
We used FINUFFT via its Julia wrapper [FINUFFT.jl](https://github.com/ludvigak/FINUFFT.jl) v3.3.0. For | ||
performance reasons, the (Cu)FINUFFT libraries were compiled locally and the | ||
FINUFFT.jl sources were modified accordingly as described | ||
[here](https://github.com/ludvigak/FINUFFT.jl?tab=readme-ov-file#advanced-installation-and-locally-compiling-binaries). | ||
FINUFFT was compiled with GCC 10.2.0 using CMake with its default flags in `Release` mode, which include `-fPIC -funroll-loops -O3 -march=native`. | ||
Moreover, we set `CMAKE_CUDA_ARCHITECTURES=80` (for an NVIDIA A100) and used the `nvcc` compiler included in CUDA 12.3. | ||
|
||
All FINUFFT benchmarks were run with relative tolerance `1e-6`. | ||
Moreover, the following options were used: | ||
|
||
- `modeord = 1` (use FFTW ordering, for consistency with NonuniformFFTs) | ||
- `spread_sort = 1` (enable point sorting in CPU plans) | ||
- `spread_kerevalmeth = 1` (use the recommended piecewise polynomial evaluation) | ||
- `fftw = FFTW.ESTIMATE` (CPU plans) | ||
|
||
and for GPU plans: | ||
|
||
- `gpu_sort = 1` (enable point sorting) | ||
- `gpu_kerevalmeth = 1` (use piecewise polynomial evaluation) | ||
- `gpu_method = 1` (global memory method, non-uniform point driven) | ||
|
||
We also tried `gpu_method = 2` (based on shared memory) but found it to be | ||
considerably slower in almost all cases (in three dimensions, at the requested tolerance). | ||
|
||
## Results | ||
|
||
### Complex data | ||
|
||
![](plots/benchmark_ComplexF64_type1.svg) | ||
|
||
![](plots/benchmark_ComplexF64_type2.svg) | ||
|
||
### Real data | ||
|
||
![](plots/benchmark_Float64_type1.svg) | ||
|
||
![](plots/benchmark_Float64_type2.svg) | ||
It also contains raw benchmark results (in `results`) and their associated | ||
plots (`plots/*.svg`) which are discussed in the Benchmarks section of the docs. |