-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use direct evaluation of kernel functions on GPU (#39)
* Start work on shared-memory implementation * WIP * Interpolation now works (in 3D, ntransforms = 1) * Minor changes * Optimisations * Further optimisations * Generalise interpolation to all dimensions * Use get_inds_vals_gpu in spreading * Spreading with shmem works but is slow Atomic add on shared memory is very slow. Is it Atomix's fault? * SM kernels now work with KA CPU backends * Test SM kernels on CPU * Fix kernel compilation on CUDA * Reorganise some code * [WIP] avoid atomics in shared-memory arrays * Avoid atomic operations on shared memory * Minor improvement * Make window_vals a matrix * Implement hybrid parallelisation in SM spreading Much faster! * More optimisations * Try to fix tests (on CPU) * Shared memory array can be complex * Update interpolation based on spreading changes * Simplify setting workgroupsize * Minor changes * Fix CPU tests * Remove unused functions * Add tests for multiple transforms * Simplify atomic adds with complex data Doesn't change performance. * point_to_cell now also returns x/Δx * Remove direct evaluation functions (for now) * Update CHANGELOG [skip ci] * Add documentation * Update docs and comments * Define direct evaluation of KB kernels * Fix direct evaluation * Minor optimisations * Use direct evaluation in GPU kernels It is the same or faster than polynomial approximation (and more accurate!). Especially for shared-memory interpolation, it seems to improve performance by a lot. * Avoid division by zero in BKB kernel * Simplify window evaluation Doesn't really affect performance, on GPU at least. * Add comment on besseli0 * Add empty CUDA extension * KB kernel: call CUDA version of besseli0 * Update comment * Define direct evaluation for GaussianKernel * Define direct evaluation for BSplineKernel * Allow different default kernel per backend * Update comments * Add tests * Update CHANGELOG * Remove old comment * Update tests
- Loading branch information
Showing
18 changed files
with
172 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
module NonuniformFFTsCUDAExt | ||
|
||
using NonuniformFFTs | ||
using NonuniformFFTs.Kernels: Kernels | ||
using CUDA | ||
using CUDA: @device_override | ||
|
||
# This is currently not wrapped in CUDA.jl, probably because besseli0 is not defined by | ||
# SpecialFunctions.jl either (the more general besseli is defined though). | ||
# See: | ||
# - https://docs.nvidia.com/cuda/cuda-math-api/index.html for available functions | ||
# - https://github.com/JuliaGPU/CUDA.jl/blob/master/ext/SpecialFunctionsExt.jl for functions wrapped in CUDA.jl | ||
@device_override Kernels._besseli0(x::Float64) = ccall("extern __nv_cyl_bessel_i0", llvmcall, Cdouble, (Cdouble,), x) | ||
@device_override Kernels._besseli0(x::Float32) = ccall("extern __nv_cyl_bessel_i0f", llvmcall, Cfloat, (Cfloat,), x) | ||
|
||
# Set KaiserBesselKernel as default backend on CUDA. | ||
# It's slightly faster than BackwardsKaiserBesselKernel when using Bessel functions from CUDA (wrapped above). | ||
# The difference is not huge though. | ||
NonuniformFFTs.default_kernel(::CUDABackend) = KaiserBesselKernel() | ||
|
||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Test direct ("exact") and approximate evaluation of window functions. | ||
# This is particularly useful for testing approximations based on piecewise polynomials. | ||
|
||
using NonuniformFFTs | ||
using NonuniformFFTs.Kernels | ||
using StaticArrays | ||
using Test | ||
|
||
function test_kernel(kernel; σ = 1.5, m = HalfSupport(4), N = 256) | ||
backend = CPU() | ||
Δx = 2π / N | ||
g = Kernels.optimal_kernel(kernel, m, Δx, σ; backend) | ||
xs = range(0.8, 2.2; length = 1000) .* Δx | ||
|
||
for x ∈ xs | ||
a = Kernels.evaluate_kernel(g, x) | ||
b = Kernels.evaluate_kernel_direct(g, x) | ||
@test a.i == b.i # same bin | ||
@test SVector(a.values) ≈ SVector(b.values) rtol=1e-7 # TODO: rtol should depend on (M, σ, kernel) | ||
end | ||
|
||
nothing | ||
end | ||
|
||
@testset "Kernel approximations" begin | ||
kernels = ( | ||
BSplineKernel(), | ||
GaussianKernel(), | ||
KaiserBesselKernel(), | ||
BackwardsKaiserBesselKernel(), | ||
) | ||
@testset "$(nameof(typeof(kernel)))" for kernel ∈ kernels | ||
test_kernel(kernel) | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters