Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

Open
mikaem opened this issue Apr 19, 2016 · 12 comments
Assignees

Comments

@mikaem
Copy link
Member

mikaem commented Apr 19, 2016

Slab decomposition with 3/2-rule dealiasing only works up to N[0]/2 number of processors, where N[0] is the number of points in the x-direction. This is because it is not possible to share N[0]*1.5 points evenly between N[0] processors.

Not sure this is worth fixing because a solution will be suboptimal no matter what.

@mikaem mikaem self-assigned this Apr 19, 2016
@rainwoodman
Copy link

I am not that familiar with the issue, but MPI FFTW won't fail in this case, would it? I thought FFTW will just leave some processes idling if an optimal domain decomposition is not found.

@mikaem
Copy link
Member Author

mikaem commented Apr 20, 2016

Actually, I have no idea how FFTW would solve this problem or if they do? I know that if the first dimension of the original data cannot be evenly shared amongst processors, then FFTW will give more data to the lowest ranks. Whether or not this will work out of the box for zero-padding I do not know.

My first thought on how to solve this problem is to not use 3/2, but instead a factor 2 in zero-padding if the number of processors equals the size of the first dimension. That would work easily, but would of course be more expensive. To try and implement uneven load-balancing seems to me like too much complexity for a suboptimal solution.

@rainwoodman
Copy link

Looks like we are talking about zero padded FFT? I am guessing I didn't quite understand the context....

Have considered doing a 1-d pencil decomposition? (c.f. https://github.com/mpip/pfft ) It will surely help load-balancing.

About dealiasing, I am recently running into this. Do you have a reference on this 3/2 dealiasing scheme? I don't think it is explained in the FFTW manual..

@mikaem
Copy link
Member Author

mikaem commented Apr 20, 2016

Zero padded FFTs it is:-) What's a 1-d pencil decomposition?

The 3/2 dealiasing scheme is very straightforward, but I have no other reference than the generic descriptions given in e.g.,
C. Canuto, M. Hussaini, A. Quarteroni and T. Zang. Spectral Methods. Fundamentals in Single Domains. Springer Verlag, 2006
or
D Kopriva Implementing Spectral methods for partial differential equations (Recommended)

@rainwoodman
Copy link

Thanks for the references.

1-d pencil decomposition is to divide the mesh to Nx x Ny x 1 processes,
where Nx x Ny = Np.
The slab decomposition in FFTW is dividing to Np x 1 x 1 processes, a
special case of 1-d pencil.

It helps load balance when Np >= Nmesh.

The math tricks are worked out in, e.g.,
https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/paper/PFFT_SIAM_88588.pdf

This is the code paper of pfft -- but pfft is written in C. Will it be much
harder to implement than the slabs?

On Wed, Apr 20, 2016 at 11:45 AM, Mikael Mortensen <[email protected]

wrote:

Zero padded FFTs it is:-) What's a 1-d pencil decomposition?

The 3/2 dealiasing scheme is very straightforward, but I have no other
reference than the generic descriptions given in e.g.,
C. Canuto, M. Hussaini, A. Quarteroni and T. Zang. Spectral Methods.
Fundamentals in Single Domains. Springer Verlag, 2006
or
D Kopriva Implementing Spectral methods for partial differential equations
(Recommended)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#1 (comment)

@mikaem
Copy link
Member Author

mikaem commented Apr 22, 2016

Isn't that just a regular 2D-pencil decomposition? Perhaps we just have different names for it? The pencil decomposition is already implemented.

@rainwoodman
Copy link

Yes. I see.

I am reading through pencil.py -- is the array distributed or shared? It is not very clear by just reading the code.

@mikaem
Copy link
Member Author

mikaem commented Apr 23, 2016

It's distributed. First two dimensions in real space are distributed, whereas complex (transformed) data are distributed in x and z directions. There's also a version with final complex data aligned in the x-direction, but it does not contain all features, like zero-padding, yet.

@rainwoodman
Copy link

Thanks.

Is there a variable storing the offsets and strides of the local array w.r.t. the global area?

Like the partition attribute and the x,r,k,w attributes in:

https://github.com/rainwoodman/pmesh/blob/master/pmesh/particlemesh.py

@mikaem
Copy link
Member Author

mikaem commented Apr 25, 2016

As I understand particlemesh.py, the attributes should all be easily available, even though they're not really at the moment. See get_local_wavenumbermesh in pencil.py for the wavenumbers kx, ky and kz in x, y and z-directions. get_local_mesh gives you the mesh in real space. Here it would be easy to create just three vectors instead of the complete local 3D mesh, but I have not had any use for it yet. Also, the local slices into the global datastructures are available in functions real_local_slice and complex_local_slice. Sorry about the poor documentation.

@rainwoodman
Copy link

Yes. I see them now.

Using numpy's broadcast rule on vectors may save memory and computation, e.g. doing gaussian smoothing number of calls to exp drops from N3 to 3*N, and memory from 3 * N3 to 3*N.

I mainly do Particle Mesh simulations, and a convenient, fast function to paint/readout particles from/to a periodic mesh is also going to be handy. Mine are ridiculously slow. Is this something you've thought about?

@mikaem
Copy link
Member Author

mikaem commented Apr 25, 2016

Are you doing Particle in Cell type of simulations? I would be interested in that through this project: http://www.mn.uio.no/fysikk/english/research/projects/4dspace/.

I'm using HDF5 to store results and it works very well in parallel for my spectralDNS solver. I know they have some routines for particles in HDF5 as well, but have not tried them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants