Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Win_allocate_shared throws error when size==0 #41

Open
csi-dweiner opened this issue Jun 1, 2020 · 1 comment
Open

MPI_Win_allocate_shared throws error when size==0 #41

csi-dweiner opened this issue Jun 1, 2020 · 1 comment

Comments

@csi-dweiner
Copy link

Hello! I found a bug using MS-MPI 10.1. MPI_Win_allocate_shared fails whenever size==0. This is explicitly supported according to the documentation:

The size argument may be different at each process and size = 0 is valid. https://docs.microsoft.com/en-us/message-passing-interface/mpi-win-allocate-shared-function

Allocating shm window: size=1 stride=1...OK.
Allocating shm window: size=0 stride=1...
job aborted:
[ranks] message

[0] fatal error
Fatal error in MPI_Win_allocate_shared: Other MPI error, error stack:
MPI_Win_allocate_shared(size=-1015819392, disp_unit=0, info=0x1, comm=0x1c000000, baseptr=0x00007FF844000000, win=0x00000097526FFD00) failed
CreateFileMapping failed, error 87

I think the issue is that under the hood, MS-MPI calls MPID_Win_create_[non]contig, which in turn calls CreateFileMappingW, whose documented behavior is:

An attempt to map a file with a length of 0 (zero) fails with an error code of ERROR_FILE_INVALID. Applications should test for files with a length of 0 (zero) and reject those files. https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-createfilemappingw

Further complicating things when investigating this, there is also a bug in the error message above: All the parameters are being printed in the wrong fields. size is displaying the value of baseptr, and all the other parameters are off by one (disp_size is displaying size==0, info is displaying disp_size==1...) I did find the code issue causing this display issue; baseptr should be moved to second-to-last in the parameter list here:

Thank you!

--Dan Weiner, HPC Research Engineer, Convergent Science (convergecfd.com)

@csi-dweiner
Copy link
Author

Update: It appears that once a shm window of nonzero size has been created, ranks allocating an "additional" 0 bytes do not cause a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant