-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algorithm for combining FFTs and tridiagonal solves #44
Comments
I think that steps 1-5 can be achieved with the current interface for in-place transforms. The main barrier is that I cannot specify using MPI
using PencilFFTs
MPI.Init()
comm = MPI.COMM_WORLD
transforms = (
PencilFFTs.Transforms.NoTransform(),
PencilFFTs.Transforms.FFT!(),
PencilFFTs.Transforms.FFT!()
)
plan = PencilFFTs.PencilFFTPlan((16, 32, 64), transforms, (2, 2), comm)
input = PencilFFTs.allocate_input(plan) throws
This might be a trivial fix. I guess if that is fixed, then we can also create a "plan" for the tranpositions in steps 7-8, 10-11 with |
Maybe I'm answering my own question: it looks like there's a function |
I think Well, almost... There's a |
Exactly, you want to use
If I understood correctly, for 7-8 you would do: transpose!(input[2], input[3]) # from local-x to local-y
transpose!(input[1], input[2]) # from local-y to local-z
You might want to work with the array
transpose!(input[2], input[1]) # from local-z to local-y
transpose!(input[3], input[2]) # from local-y to local-x
|
But perhaps you can avoid the extra transpositions (which can be quite expensive), and work with the following order of transforms: transforms = (
PencilFFTs.Transforms.FFT!(),
PencilFFTs.Transforms.FFT!(),
PencilFFTs.Transforms.NoTransform!(),
) If I understood correctly, in this case you wouldn't even need to permute data from (x, y, z) to (z, y, x) before doing the transforms. After such a transform, your z direction would be local, and you could directly perform the tridiagonal solves and integrals over local (and contiguous) data. |
Ah yes! We can use that algorithm if 1) we distribute only in It's a topic for another time, but I don't think we've found evidence that the contiguity of the z dimension yields performance benefit (despite the theoretical expectation). @christophernhill might have more to say? EDIT: a third possibility is that we support a configuration that explicitly disallows any component of the algorithm that requires z to be local (nitty gritty: vertical integrals for hydrostatic pressure or tridiagonal solves for vertically implicit time integration are disallowed). This would require some development to support but is certainly possible (though we're still investigating whether eliminating all z-integrals from other parts of the algorithm has undesirable numerical properties, see CliMA/Oceananigans.jl#1910). That "z-distributed" mode would then support a faster solver algorithm without extra transposes. |
💡 oh my |
Ahh of course, your data is distributed in
Interesting! |
good tip! |
@jipolanco - we have implemented the tridiagonal solve option for the final dimension. even though the computational operation count is lower we haven't always found that the execution time on a GPU is better. The tridiagonal solve has more data dependencies that seem to need stalling/bubbles etc... on GPU. @glwagner the tridiagonal option is the only option I think we have in Oceananigans currently for FFT based eigenmode solver non-uniform vertical grid? I did do some 1d tests where I somewhat convinced myself there is a way to treat the FFT problem that allows a stretched grid in a single dimension. However, I haven't tried that in the Oceananigans code to prove it works. Maybe now is a good time to resurrect that? |
We should definitely look at that in another issue! That would be cool... @christophernhill the performance thing I was referring to is the fact that construct our arrays so that |
I understand that distributing over Very roughly, you would have something like: transforms = (
PencilFFTs.Transforms.BFFT!(),
PencilFFTs.Transforms.BFFT!(),
PencilFFTs.Transforms.NoTransform!(),
)
plan = PencilFFTPlan(dims, transforms, ...)
input = PencilFFTs.allocate_input(plan) # this can stay the same for in-place plans
input[3] = ... # (**) copy data to transform output
plan \ input # perform the inverse of a backward FFT (=> forward FFT!)
# transformed data is in input[1] (distributed in y, z)
I still need to think this through, but I hope it makes sense. Just note that, in the |
@kburns pointed out that this:
can be done with fewer transposes. Recall the configuration is (x, y, z), so we can swap x, z:
So that reduces the number of required transposes by 2 (to 6). But it does require an extra temporary array. |
That's right, that sounds like a better alternative. Right now I'm not sure if this is possible due to (artificial) limitations in PencilArrays:
This means that, starting from the configuration |
No rush for sure! You could always add a fifth temporary and perform a local transpose before doing the distributed transpose (into the "fourth" temporary) to work around that, probably at minimal cost... |
It just needed some very minor changes :) I'll merge this and tag PencilArrays v0.17 once tests pass. Basically, as shown in the example in that PR, you will need to create a new |
Cool! |
I'd like to implement an algorithm that combines distributed in-place FFTs in (x, y) or dimensions (1, 2) with a tridiagonal solve in z (dimension 3) in Oceananigans.
Due to the presence of integrals (in addition to the tridiagonal solve) in our algorithm, the 3rd dimension must be local.
For 3D FFTs, we thus create PencilArrays which have the permutation (z, y, x) with respect to Oceananigans data structures (we do this perturbation manually / externally from PencilFFTs): CliMA/Oceananigans.jl#2513. This seems to work, allowing us to decompose our data in (x, y) and use the algorithm 1) compute the RHS of our Poisson system, simultaneously permuting the data to (3, 2, 1) and filling the PencilArray; 2) solve the Poisson equation with FFTs (so easy thanks to PencilFFTs!); 3) "unpermute" the data and store in the Oceananigans variable. Decomposing in (x, y) is crucial because our problems (like the ocean) are often very flat.
For an algorithm that combines 2D in-place FFTs with tridiagonal solves it seems we need a slightly different approach. Talking with @simone-silvestri I think the algorithm is:
Now the forward transform is complete, and the data has permutation (x, y, z) --- as usual, the reverse permutation that we started with for in-place transforms. But note that we need z to be local for the tridiagonal solve, so the next steps are
I think the pieces to write this algorithm are all here in PencilFFTs, I'm just not sure how to achieve it, so opening this issue! I'll list a few things I tried below.
The text was updated successfully, but these errors were encountered: