Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using serial Partitioner on multiple ranks leads to unexpected duplication of FunctionSpace #187

Open
fmahebert opened this issue Apr 17, 2024 · 3 comments

Comments

@fmahebert
Copy link
Contributor

What happened?

When attempting to set up a FunctionSpace whose grid points are all on one particular MPI task, I find the serial partitioner does not act the way I expect it to, and the full grid appears to be created on every rank.

See snippets and outputs below for code specifics.

Is this reflective of user error in setting up the serial partitioner, or is there a bug?

What are the steps to reproduce the bug?

When I run this code on 6 MPI tasks...

const atlas::Grid grid("F8");
eckit::LocalConfiguration conf{};
conf.set("partition", 0);
const atlas::grid::Partitioner part("serial", conf);
const atlas::functionspace::StructuredColumns cols1(grid, part);
std::cout << "on rank = " << eckit::mpi::comm().rank() << ", StructuredColumns 1 size = " << cols1.size() << std::endl;

I get the output...

on rank = 0, StructuredColumns 1 size = 512
on rank = 1, StructuredColumns 1 size = 512
on rank = 2, StructuredColumns 1 size = 512
on rank = 3, StructuredColumns 1 size = 512
on rank = 4, StructuredColumns 1 size = 512
on rank = 5, StructuredColumns 1 size = 512

Whereas with this code...

const atlas::Grid grid("F8");
std::vector<int> zeros(grid.size(), 0);
const atlas::grid::Distribution dist(eckit::mpi::comm().size(), grid.size(), zeros.data());
const atlas::functionspace::StructuredColumns cols2(grid, dist);
std::cout << "on rank = " << eckit::mpi::comm().rank() << ", StructuredColumns 2 size = " << cols2.size() << std::endl;

I get the expected all-on-rank-0 distribution...

on rank = 0, StructuredColumns 2 size = 512
on rank = 1, StructuredColumns 2 size = 0
on rank = 2, StructuredColumns 2 size = 0
on rank = 3, StructuredColumns 2 size = 0
on rank = 4, StructuredColumns 2 size = 0
on rank = 5, StructuredColumns 2 size = 0

Version

0.36

Platform (OS and architecture)

Linux x86_64

Relevant log output

No response

Accompanying data

No response

Organisation

JCSDA

@twsearle
Copy link
Contributor

Hi Francois, this is intentional behaviour (I added it a while ago). It allows the serial partitioner to be used for problems where every MPI task has a copy of the grid data. If you would like some other kind of single processor partitioner it would be easy enough to add one?

@fmahebert
Copy link
Contributor Author

@twsearle That's fair enough. But for my understanding, can you explain a bit more the intention and behavior of the partition config option? What does it control if not the task that will own the points?

@twsearle
Copy link
Contributor

@twsearle That's fair enough. But for my understanding, can you explain a bit more the intention and behavior of the partition config option? What does it control if not the task that will own the points?

Sorry I am not sure about the partition config option, sounds like something I missed when I made my change? Anyway, I just dropped in to make sure the possibility of running a functionspace duplicated in this way is maintained - its a feature not a bug from my point of view - although I don't mind how its implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants