You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been investigating the Folds ecosystem while trying to find an efficient way to get the values and indices of the two largest elements in a GPU array. Base.foldl works like a charm on the CPU:
function findmax2(prev, curr)
i, v = curr
v > prev.first.v && return (first=(; i, v), second=prev.first)
v > prev.second.v && return (first=prev.first, second=(; i, v))
return prev
end
julia> A = zeros(Float32, 512, 512);
julia> A[1] = 1.0f0; A[end] = 0.5f0;
julia> foldl(findmax2, enumerate(A), init=(first=(i=0, v=0.0f0), second=(i=0, v=0.0f0)))
(first = (i = 1, v = 1.0f0), second = (i = 262144, v = 0.5f0))
It's a bit trickier with FLoops.jl, but I came up with a working single-threaded CPU implementation while trying to get a GPU version. I expect the same thing is causing it to fail for both multithreaded CPU and GPU.
using CUDA
using Transducers, FoldsCUDA, FLoops
function folds_findmax2(xs, ex = xs isa CuArray ? CUDAEx() : ThreadedEx())
xtypemin = typemin(eltype(xs))
@floop ex for (i, x) in zip(eachindex(xs), vec(xs))
@reduce() do (xmax1=xtypemin; x), (imax1=-1; i)
if isless(xmax1, x)
imax2, xmax2 = imax1, xmax1
imax1, xmax1 = i, x
end
end
i == imax1 && continue
@reduce() do (xmax2=xtypemin; x), (imax2=-1; i)
if isless(xmax2, x)
imax2, xmax2 = i, x
end
end
end
return ((imax1, xmax1), (imax2, xmax2))
end
I've been investigating the Folds ecosystem while trying to find an efficient way to get the values and indices of the two largest elements in a GPU array.
Base.foldl
works like a charm on the CPU:It's a bit trickier with FLoops.jl, but I came up with a working single-threaded CPU implementation while trying to get a GPU version. I expect the same thing is causing it to fail for both multithreaded CPU and GPU.
This is on Julia 1.5.2 with Transducers, FLoops, and FoldCUDA on latest master.
The text was updated successfully, but these errors were encountered: