-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1
Comments
I am not that familiar with the issue, but MPI FFTW won't fail in this case, would it? I thought FFTW will just leave some processes idling if an optimal domain decomposition is not found. |
Actually, I have no idea how FFTW would solve this problem or if they do? I know that if the first dimension of the original data cannot be evenly shared amongst processors, then FFTW will give more data to the lowest ranks. Whether or not this will work out of the box for zero-padding I do not know. My first thought on how to solve this problem is to not use 3/2, but instead a factor 2 in zero-padding if the number of processors equals the size of the first dimension. That would work easily, but would of course be more expensive. To try and implement uneven load-balancing seems to me like too much complexity for a suboptimal solution. |
Looks like we are talking about zero padded FFT? I am guessing I didn't quite understand the context.... Have considered doing a 1-d pencil decomposition? (c.f. https://github.com/mpip/pfft ) It will surely help load-balancing. About dealiasing, I am recently running into this. Do you have a reference on this 3/2 dealiasing scheme? I don't think it is explained in the FFTW manual.. |
Zero padded FFTs it is:-) What's a 1-d pencil decomposition? The 3/2 dealiasing scheme is very straightforward, but I have no other reference than the generic descriptions given in e.g., |
Thanks for the references. 1-d pencil decomposition is to divide the mesh to Nx x Ny x 1 processes, It helps load balance when Np >= Nmesh. The math tricks are worked out in, e.g., This is the code paper of pfft -- but pfft is written in C. Will it be much On Wed, Apr 20, 2016 at 11:45 AM, Mikael Mortensen <[email protected]
|
Isn't that just a regular 2D-pencil decomposition? Perhaps we just have different names for it? The pencil decomposition is already implemented. |
Yes. I see. I am reading through pencil.py -- is the array distributed or shared? It is not very clear by just reading the code. |
It's distributed. First two dimensions in real space are distributed, whereas complex (transformed) data are distributed in x and z directions. There's also a version with final complex data aligned in the x-direction, but it does not contain all features, like zero-padding, yet. |
Thanks. Is there a variable storing the offsets and strides of the local array w.r.t. the global area? Like the partition attribute and the x,r,k,w attributes in: https://github.com/rainwoodman/pmesh/blob/master/pmesh/particlemesh.py |
As I understand particlemesh.py, the attributes should all be easily available, even though they're not really at the moment. See |
Yes. I see them now. Using numpy's broadcast rule on vectors may save memory and computation, e.g. doing gaussian smoothing number of calls to exp drops from N3 to 3*N, and memory from 3 * N3 to 3*N. I mainly do Particle Mesh simulations, and a convenient, fast function to paint/readout particles from/to a periodic mesh is also going to be handy. Mine are ridiculously slow. Is this something you've thought about? |
Are you doing Particle in Cell type of simulations? I would be interested in that through this project: http://www.mn.uio.no/fysikk/english/research/projects/4dspace/. I'm using HDF5 to store results and it works very well in parallel for my spectralDNS solver. I know they have some routines for particles in HDF5 as well, but have not tried them. |
Slab decomposition with 3/2-rule dealiasing only works up to N[0]/2 number of processors, where N[0] is the number of points in the x-direction. This is because it is not possible to share N[0]*1.5 points evenly between N[0] processors.
Not sure this is worth fixing because a solution will be suboptimal no matter what.
The text was updated successfully, but these errors were encountered: