From 3634e7e176dfee42fd1dfa67a6ae48f0efe0075f Mon Sep 17 00:00:00 2001
From: Juan Ignacio Polanco <juan-ignacio.polanco@cnrs.fr>
Date: Fri, 29 Nov 2024 16:34:59 +0100
Subject: [PATCH] Update benchmarks/README.md

---
 benchmarks/README.md | 60 +++-----------------------------------------
 1 file changed, 4 insertions(+), 56 deletions(-)

diff --git a/benchmarks/README.md b/benchmarks/README.md
index 469bc73..fc4a0bc 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -1,59 +1,7 @@
 # Benchmarks
 
-The benchmarks consist in type-1 and type-2 NUFFTs on a uniform 3D grid of
-fixed dimensions $M^3 = 256^3$ (excluding oversampling). We vary the number of
-non-uniform points $N$, so that the point density $ρ = N / M^3$ takes values
-between $10^{-4}$ (very few points) and $10^1$ (very dense).
-Points are randomly located in $[0, 2π)^3$ using a uniform distribution.
-The relative tolerance is fixed to $10^{-6}$.
-In NonuniformFFTs.jl, this can be achieved with the parameters `σ = 1.5`
-(oversampling factor) and $m = HalfSupport(4)$ (see [Accuracy](@ref accuracy)).
-All tests are run in double precision (`Float64` or `ComplexF64` non-uniform data).
+This directory contains scripts for executing benchmarks (`run_benchmarks.jl`)
+and generated plotting results (`plots/plot_benchmarks.jl`).
 
-The tests were run on a cluster with an AMD EPYC 7302 CPU (32 threads) and an
-NVIDIA A100 GPU.
-
-The benchmarks compare NonuniformFFTs.jl v0.6.7 (26/11/2024) and FINUFFT v2.3.1.
-
-Each reported time includes (1) the time spent processing non-uniform points
-(`set_points!` / `(cu)finufft_setpts!`) and (2) the time spent on the actual transform (`exec_type{1,2}!` / `(cu)finufft_exec!`).
-
-## FINUFFT set-up
-
-We used FINUFFT via its Julia wrapper [FINUFFT.jl](https://github.com/ludvigak/FINUFFT.jl) v3.3.0. For
-performance reasons, the (Cu)FINUFFT libraries were compiled locally and the
-FINUFFT.jl sources were modified accordingly as described
-[here](https://github.com/ludvigak/FINUFFT.jl?tab=readme-ov-file#advanced-installation-and-locally-compiling-binaries).
-FINUFFT was compiled with GCC 10.2.0 using CMake with its default flags in `Release` mode, which include `-fPIC -funroll-loops -O3 -march=native`.
-Moreover, we set `CMAKE_CUDA_ARCHITECTURES=80` (for an NVIDIA A100) and used the `nvcc` compiler included in CUDA 12.3.
-
-All FINUFFT benchmarks were run with relative tolerance `1e-6`.
-Moreover, the following options were used:
-
-- `modeord = 1` (use FFTW ordering, for consistency with NonuniformFFTs)
-- `spread_sort = 1` (enable point sorting in CPU plans)
-- `spread_kerevalmeth = 1` (use the recommended piecewise polynomial evaluation)
-- `fftw = FFTW.ESTIMATE` (CPU plans)
-
-and for GPU plans:
-
-- `gpu_sort = 1` (enable point sorting)
-- `gpu_kerevalmeth = 1` (use piecewise polynomial evaluation)
-- `gpu_method = 1` (global memory method, non-uniform point driven)
-
-We also tried `gpu_method = 2` (based on shared memory) but found it to be
-considerably slower in almost all cases (in three dimensions, at the requested tolerance).
-
-## Results
-
-### Complex data
-
-![](plots/benchmark_ComplexF64_type1.svg)
-
-![](plots/benchmark_ComplexF64_type2.svg)
-
-### Real data
-
-![](plots/benchmark_Float64_type1.svg)
-
-![](plots/benchmark_Float64_type2.svg)
+It also contains raw benchmark results (in `results`) and their associated
+plots (`plots/*.svg`) which are discussed in the Benchmarks section of the docs.