Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for querying if OpenSHMEM is currently initialized #457

Closed
agrippa opened this issue Nov 4, 2020 · 10 comments
Closed

Add API for querying if OpenSHMEM is currently initialized #457

agrippa opened this issue Nov 4, 2020 · 10 comments
Assignees
Milestone

Comments

@agrippa
Copy link
Collaborator

agrippa commented Nov 4, 2020

Unless I'm missing it, there is currently no API for querying if an OpenSHMEM implementation has been initialized yet (i.e. whether we are in the part of a program between shmem_init and shmem_finalize). Theoretically, this could be something like:

bool shmem_is_initialized();

In general this seems like a useful API, and I've just run in to a concrete case where I "need" it. I'm writing a library in C++ that uses OpenSHMEM. Stack-allocated objects can collectively allocate and deallocate symmetric objects. If a stack-allocated object is created in main(), C++ will not call its destructor until we exit main() which is likely after the programmer has already called shmem_finalize() and cleaned up the symmetric heap. My library calling shmem_free in the destructor of stack-allocated objects after shmem_finalize then causes the app to crash. I can workaround this in a couple of ways: (1) asking users to artificially introduce a new scope inside of main using {} so that C++ objects go out of scope before shmem_finalize, or (2) adding initialize/finalize routines to my library that in turn call shmem_init/finalize and then track that state myself. I'm currently doing (1) and I'd rather avoid (2) as the user may want to use SHMEM before/after my library is in use.

Question to implementors: is this something difficult to add? Is there any reason why this API doesn't already exist? Is there an equivalent in the spec already that I'm missing?

@nspark
Copy link
Contributor

nspark commented Jun 1, 2021

This seems reasonable and straightforward, and not incompatible with #263 (multiple init/finalize). It seems like having the shmem_is_finalized would be worth adding for completeness. (IIRC, @jeffhammond suggested this a long time ago.)

@naveen-rn
Copy link
Contributor

I suppose, these routines are usable outside the shmem-parallel region (i.e before shmem_init and after shmem_finalize).

@nspark
Copy link
Contributor

nspark commented Jun 1, 2021

Yes, and I think they should be thread safe, independent of whether SHMEM is initialized at any particular thread safety level.

@naveen-rn
Copy link
Contributor

In general, as an independent programming model usage - I don't see any issues with this routine.

But, we might have issues in hybrid programming model usage - for example CAF or UPC using SHMEM as communication layer. These programming models, does an shmem_init() internally during the program startup and any further user created explicit shmem_init() call will be essentially a no-op. But, calling shmem_is_initialized() in this case would result in TRUE even before an explicit shmem_init() call.

@nspark
Copy link
Contributor

nspark commented Jun 1, 2021

Subsequent calls to shmem_init result in undefined behavior, so the case of hybrid SHMEM + CAF/UPC-over-SHMEM could be problematic without such an interface.

Yes, it could be confusing, but I'd argue such a hybrid program would be problematic because an explicit call to shmem_finalize would invalidate UPC operations after finalize but before the program exits.

@jeffhammond
Copy link

MPI did a bunch of work on this exact topic a while back. @jdinan is well-qualified to summarize.

@jdinan
Copy link
Collaborator

jdinan commented Jul 7, 2021

There is no harm in replicating the MPI query functions; I think any design effort in this space will arrive at the same API. That said, there are still issues with only having is_initialized and is_finalized because the SHMEM library doesn't know that it has multiple users of the API. For example, the first user to initialize the library can finalize it while another library is still using it. Reference counted init/finalize deals with this case and I'm not sure we would need is_initialized and is_finalized if we add reference counted init/finalize.

There is another case that reference counted init/finalize doesn't handle, which is two pieces of code that use SHMEM one after the other, resulting in init...finalize, , init...finalize. Historically, MPI does not support reinitialization after the library is finalized (a motivation to have query functions in addition to ref counting if we go this route). We could require SHMEM libraries to support reinitialization, but this is nontrivial to implement/test.

@jeffhammond
Copy link

We could require SHMEM libraries to support reinitialization, but this is nontrivial to implement/test.

Is it not the case that the reentrant initialization starts with a fetch-and-increment and the initialization is only performance if the fetched value is zero? The reentrant finalization does the same, for a fetched value of one.

I can't remember what I've written that does this but pthread_once may also be useful here.

I acknowledge that rigorously verifying correctness is, like essentially everything in software, very difficult, but I don't think we need to formally verify OpenSHMEM implementations the way we do computers associated with locomotion and robotics.

@jdinan
Copy link
Collaborator

jdinan commented Jul 9, 2021

The easy thing to do is to not finalize the OpenSHMEM library in the call to finalize and clean up only atexit. One of the issues with reinitialization is statically initialized variables in the library. These are often not reset in the call to finalize, which can lead to problems.

@jdinan jdinan assigned jdinan and unassigned nspark Dec 7, 2023
@jdinan
Copy link
Collaborator

jdinan commented May 30, 2024

Resolved by #512

@jdinan jdinan closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants