Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

mikaem · 2016-04-19T07:26:55Z

Slab decomposition with 3/2-rule dealiasing only works up to N[0]/2 number of processors, where N[0] is the number of points in the x-direction. This is because it is not possible to share N[0]*1.5 points evenly between N[0] processors.

Not sure this is worth fixing because a solution will be suboptimal no matter what.

rainwoodman · 2016-04-19T19:50:25Z

I am not that familiar with the issue, but MPI FFTW won't fail in this case, would it? I thought FFTW will just leave some processes idling if an optimal domain decomposition is not found.

mikaem · 2016-04-20T07:44:34Z

Actually, I have no idea how FFTW would solve this problem or if they do? I know that if the first dimension of the original data cannot be evenly shared amongst processors, then FFTW will give more data to the lowest ranks. Whether or not this will work out of the box for zero-padding I do not know.

My first thought on how to solve this problem is to not use 3/2, but instead a factor 2 in zero-padding if the number of processors equals the size of the first dimension. That would work easily, but would of course be more expensive. To try and implement uneven load-balancing seems to me like too much complexity for a suboptimal solution.

rainwoodman · 2016-04-20T17:41:03Z

Looks like we are talking about zero padded FFT? I am guessing I didn't quite understand the context....

Have considered doing a 1-d pencil decomposition? (c.f. https://github.com/mpip/pfft ) It will surely help load-balancing.

About dealiasing, I am recently running into this. Do you have a reference on this 3/2 dealiasing scheme? I don't think it is explained in the FFTW manual..

mikaem · 2016-04-20T18:45:35Z

Zero padded FFTs it is:-) What's a 1-d pencil decomposition?

The 3/2 dealiasing scheme is very straightforward, but I have no other reference than the generic descriptions given in e.g.,
C. Canuto, M. Hussaini, A. Quarteroni and T. Zang. Spectral Methods. Fundamentals in Single Domains. Springer Verlag, 2006
or
D Kopriva Implementing Spectral methods for partial differential equations (Recommended)

rainwoodman · 2016-04-21T14:20:50Z

Thanks for the references.

1-d pencil decomposition is to divide the mesh to Nx x Ny x 1 processes,
where Nx x Ny = Np.
The slab decomposition in FFTW is dividing to Np x 1 x 1 processes, a
special case of 1-d pencil.

It helps load balance when Np >= Nmesh.

The math tricks are worked out in, e.g.,
https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/paper/PFFT_SIAM_88588.pdf

This is the code paper of pfft -- but pfft is written in C. Will it be much
harder to implement than the slabs?

On Wed, Apr 20, 2016 at 11:45 AM, Mikael Mortensen <[email protected]

wrote:

Zero padded FFTs it is:-) What's a 1-d pencil decomposition?

The 3/2 dealiasing scheme is very straightforward, but I have no other
reference than the generic descriptions given in e.g.,
C. Canuto, M. Hussaini, A. Quarteroni and T. Zang. Spectral Methods.
Fundamentals in Single Domains. Springer Verlag, 2006
or
D Kopriva Implementing Spectral methods for partial differential equations
(Recommended)

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#1 (comment)

mikaem · 2016-04-22T12:15:10Z

Isn't that just a regular 2D-pencil decomposition? Perhaps we just have different names for it? The pencil decomposition is already implemented.

rainwoodman · 2016-04-23T00:18:10Z

Yes. I see.

I am reading through pencil.py -- is the array distributed or shared? It is not very clear by just reading the code.

mikaem · 2016-04-23T08:44:57Z

It's distributed. First two dimensions in real space are distributed, whereas complex (transformed) data are distributed in x and z directions. There's also a version with final complex data aligned in the x-direction, but it does not contain all features, like zero-padding, yet.

rainwoodman · 2016-04-24T21:07:56Z

Thanks.

Is there a variable storing the offsets and strides of the local array w.r.t. the global area?

Like the partition attribute and the x,r,k,w attributes in:

https://github.com/rainwoodman/pmesh/blob/master/pmesh/particlemesh.py

mikaem · 2016-04-25T06:51:59Z

As I understand particlemesh.py, the attributes should all be easily available, even though they're not really at the moment. See get_local_wavenumbermesh in pencil.py for the wavenumbers kx, ky and kz in x, y and z-directions. get_local_mesh gives you the mesh in real space. Here it would be easy to create just three vectors instead of the complete local 3D mesh, but I have not had any use for it yet. Also, the local slices into the global datastructures are available in functions real_local_slice and complex_local_slice. Sorry about the poor documentation.

rainwoodman · 2016-04-25T17:58:14Z

Yes. I see them now.

Using numpy's broadcast rule on vectors may save memory and computation, e.g. doing gaussian smoothing number of calls to exp drops from N3 to 3*N, and memory from 3 * N3 to 3*N.

I mainly do Particle Mesh simulations, and a convenient, fast function to paint/readout particles from/to a periodic mesh is also going to be handy. Mine are ridiculously slow. Is this something you've thought about?

mikaem · 2016-04-25T21:25:22Z

Are you doing Particle in Cell type of simulations? I would be interested in that through this project: http://www.mn.uio.no/fysikk/english/research/projects/4dspace/.

I'm using HDF5 to store results and it works very well in parallel for my spectralDNS solver. I know they have some routines for particles in HDF5 as well, but have not tried them.

mikaem self-assigned this Apr 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

mikaem commented Apr 19, 2016

rainwoodman commented Apr 19, 2016

mikaem commented Apr 20, 2016

rainwoodman commented Apr 20, 2016

mikaem commented Apr 20, 2016

rainwoodman commented Apr 21, 2016

mikaem commented Apr 22, 2016

rainwoodman commented Apr 23, 2016

mikaem commented Apr 23, 2016

rainwoodman commented Apr 24, 2016

mikaem commented Apr 25, 2016

rainwoodman commented Apr 25, 2016

mikaem commented Apr 25, 2016

Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

Comments

mikaem commented Apr 19, 2016

rainwoodman commented Apr 19, 2016

mikaem commented Apr 20, 2016

rainwoodman commented Apr 20, 2016

mikaem commented Apr 20, 2016

rainwoodman commented Apr 21, 2016

mikaem commented Apr 22, 2016

rainwoodman commented Apr 23, 2016

mikaem commented Apr 23, 2016

rainwoodman commented Apr 24, 2016

mikaem commented Apr 25, 2016

rainwoodman commented Apr 25, 2016

mikaem commented Apr 25, 2016