Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MPI implementation of serialisation to handle unreliable machines #99

Open
grospelliergilles opened this issue Dec 14, 2021 · 0 comments
Labels
arccore Arccore component enhancement New feature or request

Comments

@grospelliergilles
Copy link
Member

The current MPI implementation of serialize message does the following this:

  • send the message in one MPI call if its size is small (by defaut 5000 ko)
  • send the message in two MPI calls if it's not the case. The first message contains the total size and the second message is the full message.

Some MPI implementation may have (temporary?) problems when there are too many or too big messages.

To solve this problem, we can try several fix:

  • send the message with multiple packets whose size is fixed
  • do not send the full message if the corresponding receive message is not posted.
@grospelliergilles grospelliergilles added arccore Arccore component enhancement New feature or request labels Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arccore Arccore component enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant