Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interoperability with other programming models #243

Closed
minsii opened this issue Sep 4, 2018 · 30 comments
Closed

Interoperability with other programming models #243

minsii opened this issue Sep 4, 2018 · 30 comments
Assignees
Milestone

Comments

@minsii
Copy link
Collaborator

minsii commented Sep 4, 2018

Goal: define a new interoperability section

Context: Appendix D has been removed because the contents are outdated. We want to make a new interoperability section that defines the interoperability of OpenSHMEM with other programming models such as MPI, UPC, CAF, in order to help portable hybrid OpenSHMEM + X programs.

The section is planed to cover:

  • Standardization of features / useful APIs for portable hybrid program
  • Clarification of undefined behavior caused by mixed use of different models
  • Advice to users about the ways to guarantee program portability
  • Advice to runtime developers about the ways to optimize performance

A PR is created for MPI interoperability. See minsii#1

@minsii
Copy link
Collaborator Author

minsii commented Sep 4, 2018

Updates in 20180904's Thread WG call:

Slides:
SHMEM_Interoperability_20180904.pptx

Summary of feedbacks (mainly for progress property SHMEM_PROGRESS_UNIFIED|SEPARATE):

  • Do we also need define the progress model for SHMEM ?
  • Is the progress property only used for performance optimization ?
  • The survey (see slides) covers only SHMEM + MPI, the other programming models (e.g., UPC, CAF) may not be progressed in a consistent way. Maybe we should rename the property values to SHMEM_PROGRESSES_MPI|MPI_PROGRESSES_SHMEM

@spophale
Copy link
Contributor

spophale commented Sep 4, 2018

Notes form WG Meeting Tue Sep 4th:

  • Atomicity guarantees between MPI and SHMEM operations will not be part of the Spec - user responsibility.
  • Removed section on data allocation by one programming model and used by the other.
  • Open Question : How to interpret SHMEM_PROGRESS_UNIFIED and SHMEM_PROGRESS_SEPARATE
  • Current focus OSH + MPI, other programming models will be added later when we have more information.

@bcernohous
Copy link

bcernohous commented Sep 4, 2018

Try to explain myself after a bit more thought.

Open Question : How to interpret SHMEM_PROGRESS_UNIFIED and SHMEM_PROGRESS_SEPARATE

I think SHMEM queries SHMEM attributes, not MPI/UPC/Fortran.

I think there are 3 attributes (+detail/sub attributes if desired).

  1. SHMEM make progress on xxx :
  • SHMEM_PROGRESS_ALL

    or individually

    • SHMEM_PROGRESS_MPI
    • SHMEM_PROGRESS_UPC
      .. to whatever level of detail
  1. SHMEM needs other programming models, or the user applications, to make progress:
  • SHMEM_REQUIRES_PROGRESS (or SHMEM_PROGRESS_MANUAL)
  1. SHMEM always makes progress (no dependency on other programming models or application)
  • SHMEM_PROGRESS_AUTO

For 2), I don't think we can define if MPI progress (MPI_TEST) make progress on SHMEM, only that SHMEM needs progress.

e.g.

Cray XC supports Cray MPI, Intel MPI, OpenMPI (and sometime MVAPICH).

I don't see how Cray SHMEM defines 'unified progress' or anything about MPI, only that SHMEM itself needs (or doesn't need) progress from other programming models/application.

(edit) and I don't see SHMEM defining whether or not MPI makes auto progress or required manual calls to MPI_Test/wait/.

@anshumang
Copy link

From section 4.1 [Progress of OpenSHMEM Operations] in spec (v 1.4), one interpretation is that progress without API call is required. If so, SHMEM_REQUIRES_PROGRESS (or SHMEM_PROGRESS_MANUAL) is not allowed by the spec and SHMEM_PROGRESS_AUTO is guaranteed. If the interpretation of the spec allows relaxed progress, why not query that property outside the scope of interop?

Regarding SHMEM_PROGRESS_ALL (or SHMEM_PROGRESS_X) : If an application is written to rely on this property, isn't the programmer also assuming a specific OpenSHMEM implementation? If so, why not leave this as a property that can only be queried by a vendor-specific shmemx_ API?

@jeffhammond
Copy link

OSHMPI has unfinished support for MPMD and it is straightforward to extend that to DPM (dynamic process management). There isn't any serious impediment to doing it, but I had no motivation other than curiosity. If somebody in the OpenSHMEM community wants to know how this works, I might be able to implement it for testing.

@jeffhammond
Copy link

@minsii Regarding progress, this is rather complicated and perhaps not amenable to a simple boolean descriptor. For example, OpenSHMEM might progress MPI RMA but not two-sided. How do we express that?

@minsii
Copy link
Collaborator Author

minsii commented Sep 4, 2018

@bcernohous Thanks a lot for the detailed explanation. However, it might be inappropriate if we consider the progress property only from SHMEM. This is an information we want to have only for hybrid SHMEM + X program. As @anshumang mentioned, the OpenSHMEM spec actually guanratees that the progress of SHMEM can only be SHMEM_PROGRESS_AUTO.

@anshumang Thanks for pointing out the progress spec. After reading the spec again, I agree with your interpretation. I.e., a SHMEM implementation is required to make progress even when the program does not make API call. It is a stronger progress guarantee than that in MPI. Then the info we really need to query is "whether SHMEM makes progress for X".

Regarding shmemx suggestion, I do not think the query API has to be vendor-specific, however, the vendor can define different values for the property (e.g., SHMEM_PROGRESS_MPI|SHMEM_PROGRESS_UPC, or none)

Another issue I am reading from Bob's comment is that, although an runtime implementation may support both MPI and SHMEM, the user may link with another implementation of one of the models at runtime (or preload at execution time). E.g., MVAPICH gives both MPI and SHMEM and supports SHMEM_PROGRESS_MPI, but the user may link (or preload) another binary of MPI which breaks SHMEM_PROGRESS_MPI. Thus, the properties cannot be statically set.

I guess the runtime may figure out some properties (e.g., progress, pe mapping) after shmem_init & MPI_Init, but the ordering of init calls has to be exposed in a different way.

@minsii
Copy link
Collaborator Author

minsii commented Sep 4, 2018

@jeffhammond I do not think OpenSHMEM itself can support dynamic progress because shmem_init is required to be a collective call over all PEs(section 9.1.1 in spec 1.4).

Example of unsupported dynamic process:

P0                                      
shmem_init();
MPI_Comm_spawn(P1) or fork(P1); 

P1
shmem_init();

However, it might be allowed if some PEs are forked separately (e.g., through fork() or MPI_comm_spawn) but all the PEs call shmem_init collectively. This is being discussed in #223

Example that might be supported:

P0                                      
MPI_Comm_spawn(P1) or fork(P1); 
shmem_init();

P1
shmem_init();

@bcernohous
Copy link

the OpenSHMEM spec actually guanratees that the progress of SHMEM can only be SHMEM_PROGRESS_AUTO.

That was my original understanding but from today's call it seemed like that wasn't universal.

So I think the only question is whether shmem makes progress for XXXX. And that's a hard thing to answer. As you've suggested, it's not static.

I'd guess it's very limited (my vendor shmem + my vendor mpi) so it a query really useful vs knowing which libraries you linked with? Header versions?

@jeffhammond
Copy link

Yes, @minsii, I was thinking of a case where an MPI application spawned islands, each of which would call shmem_init and act independent of one another. It is not possible to have symmetric variables across the original and spawned communicators with the current symmetric heap API, although one can of course do it in MPI-3 RMA.

@jeffhammond
Copy link

@bcernohous Indeed, I agree that SHMEM_PROGRESS_AUTO is required by any high-quality implementation and that we shouldn't spend a lot of time worrying about other ones.

@minsii
Copy link
Collaborator Author

minsii commented Sep 14, 2018

@bcernohous I think the runtime can figure out some properties after all hybrid models are initialized in program. The supported properties include:

  • progress: e.g., SHMEM_PROGRESS_MPI|SHMEM_PROGRESS_UPC, or none
  • pe-mapping: e.g., FORTRAN_IMAGES_ARE_PES|MPI_WORLD_RANKS_ARE_PES, MPI_WORLD_RANKS_ARE_PES, or none

The assumption is that, only unified runtime (e.g., one who implements both SHMEM and X) wants to return not-none value. Once both SHMEM and X(es) are initialized, the runtime will know whether the internal X is being used. E.g., the vendor runtime developer can simply set a global variable is_vendor_X_initiated=1 when X_init() is called. Thus, the user can query valid properties after all initialization calls. I think it is still useful.

The only property we cannot support is initialization-ordering. Because the user program needs this info before any initialization call, but the runtime cannot figure out correct value until initialized.

@minsii
Copy link
Collaborator Author

minsii commented Sep 18, 2018

Slides used for threads WG meeting on September 18:
SHMEM_Interoperability_20180918.pptx

Comments from today's meeting:

  • Dynamic loaded external library: It helps the developer to return query properties if the query API can be called only after initialized both SHMEM and X model(s). The runtime may choose to support query before initialization calls (e.g., return the ordering of initialization property) if the runtime could detect dynamic loaded external library in a different way.
  • Utilization of the query API: It is unclear how the query API can help user program's portability. Maybe we can prepare code examples for it.
  • Dynamic process support: We could also extend SHMEM to support inter-group communication (e.g., across two MPI inter-communicators). This can be left as a future work.

@minsii
Copy link
Collaborator Author

minsii commented Oct 24, 2018

@naveen-rn and I have discussed offline about the query API. Here is a summary of the comments:

  • We want to introduce generic interoperability features in OpenSHMEM spec. If the features rely on some internal details of a SHMEM implementation, it does not seem to be generic. E.g., initialization ordering and pe-rank mapping are implementation-specific details.
  • When SHMEM / MPI libraries are dynamically loaded, the above info can be incorrect unless the SHMEM implementation chooses to workaround it. In the worst case, a SHMEM implementation may always return unsupported.

Alternative options:

  1. Instead of querying initialization-ordering, we could introduce shmem_initialized and shmem_finalized similar to MPI_Initialized and MPI_Finalized. We suggest the hybrid program always query runtime status before making the init/finalize call.
  2. For pe-rank mapping, the user program should always query ranks/pes by using the standard APIs and maintain the mapping in the user program.

I especially like the idea of shmem_initialized|finalized(). It seems to be a more generic API than querying the ordering. It also resolves the initialization problem.

For pe-rank mapping, the query feature could help user program only when the SHMEM implementation supports it thus a portable program has to still write the manual version. Besides, it saves cost only at program init time (e.g., no info exchange). The benefit of this feature seems to be limited and likely increases code complexity in SHMEM implementation. Thus, I would suggest that we discard the proposal of pe-mapping in the initial version and go with option-2.

@minsii
Copy link
Collaborator Author

minsii commented Oct 30, 2018

A pull request has been created to prepare the spec document draft: minsii#1

@minsii
Copy link
Collaborator Author

minsii commented Oct 30, 2018

The slides used for 2018-10-30's WG call: SHMEM_Interoperability_20181030.pptx

PDF of spec draft (work in progress):
openshmem-spec-interoperability-20181030.pdf

@manjugv
Copy link
Collaborator

manjugv commented Oct 30, 2018

Jeff S articulates the race condition very well in his slides:
https://blogs.cisco.com/performance/crazy-ideas-to-revamp-mpi_init-and-mpi_finalize

@minsii
Copy link
Collaborator Author

minsii commented Oct 31, 2018

Thanks @manjugv . I agree that when we consider SHMEM may be called by multiple libraries, shmem_initialized and shmem_finalized seem insufficient. The MPI sessions proposal is made for this purpose.

To address the specific issue where two threads concurrently make shmem_init_thread() call rather than defining such a case as undefined behavior:

Option 1 is to make the entire test-and-init as an atomic operation, but it might be hard if two threads are maintained by different libraries.

if(shmem_initialized())
    shmem_init_thread()

Option 2 is to require shmem_init_thread() to be a no-op if SHMEM has already been initialized. The shmem_init_thread() call can be implemented as an atomic op to synchronize between multiple calling threads. This change might be similar to the deprecated start_pes (i.e., calling start_pes more than once has no subsequent effect).

@naveen-rn
Copy link
Contributor

naveen-rn commented Oct 31, 2018 via email

@minsii
Copy link
Collaborator Author

minsii commented Oct 31, 2018

Not yet, the current spec still says "If the call to shmem_init_thread is unsuccessful in allocating and initializing resources for the OpenSHMEM library, then the behavior of any subsequent call to the OpenSHMEM library is undefined."

If the second shmem_init_thread call be a no-op, calling shmem_init_thread multiple times should have no subsequent effect similar to start_pes.

@minsii
Copy link
Collaborator Author

minsii commented Dec 11, 2018

Here are some updates about the init/finalize questions.

  1. To independently use SHMEM in multiple libraries:

    • shmem_initialized and shmem_finalized are not sufficient (e.g., test-and-init is not atomic when libraries use different threads)
    • Alternative option: Can we allow multiple calls to shmem_init, shmem_init_thread, and shmem_finalize ?
  2. How can we describe SHMEM/MPI Interoperability about init/finalize ?

    • Standalone MPI/SHMEM (e.g., SOS, OSSS-UCX) should support any order
    • Unified MPI/SHMEM runtime (e.g., HPE, CrayMPI, OMPI) should ensure that MPI and SHMEM can be initialized and finalized in any order (e.g., using reference counter)
    • If SHMEM library is dependent on MPI (e.g., OSHMPI), some init/finalize ordering might not work correctly (e.g., shmem_init - MPI_Init - MPI_Finalize - shmem_finalize). However, this is MPI's issue (i.e., cannot independently use MPI in multiple libraries), thus we should fix it in MPI but not in SHMEM.
    • If MPI library is dependent on SHMEM (no existing implementation), we should support any ordering of init/finalize once Q-1 is solved.

It would be great if we can discuss the above thoughts in tomorrow's thread WG call (see attached slides).
SHMEM_Interoperability_20181211.pptx

@naveen-rn
Copy link
Contributor

naveen-rn commented Dec 11, 2018

@minsii I think it would be useful to have a look at this old mail-chain - we had some extensive discussions on the different issues with multiple init and finalize. LINK

@minsii
Copy link
Collaborator Author

minsii commented Dec 11, 2018

@naveen-rn Thanks for pointing out the previous discussion ! Let me summary that mail-chain:

  • Option 1: Init and finalize at first-call
    Issue: no SHMEM call can be made after first finalize. For example:
    FOO_init // init
    BAR_init // skip init; shmem_malloc
    FOO_finalize // fin
    BAR_finalize // shmem_free <--- error !!
  • Option 2: query init/finalize status by using shmem_initialized/shmem_finalized, similar to MPI
    Issue: test-and-init is not atomic (e.g., test-and-init by different threads in FOO and BAR)
  • Option 3: Init at first call, finalize at refcount == 0
    Issue: no one finalizes if #init > #finalize
    • Possible solution: return current refcount value, and user should make finalize calls until refcount == 0. Or the spec requires the program must make the same amount of init and finalize calls.
    • Potential issue: is there any application cannot do it ? (@bcernohous mentioned some special cases)
  • Option 4: Init at first call, finalize at last call
    • Possible implementation: define special shmem_last_finalize API for library use. shmem_last_finalize performs finalize if finalize is not yet performed, otherwise noop. The user must ensure shmem_last_finalize is the last shmem call (e.g., by registering with atexit). One SHMEM runtime may also implement shmem_finalize always as noop, and always implicitly call shmem_last_finalize at program exit.

    • Pros:

      • User can call init/finalize at arbitrary times
      • Support init-finalize-init-finalize (equals to init-noop-noop-noop-implicit_finalize).

Option 4 seems to be the most useful one, but it is actually the same as deprecated start_pes with implicit finalize API. Why did we deprecate it ? Is it for tools (e.g., need explicit finalize call as shmem_tool_sync, as @jeffhammond mentioned) ?

@minsii
Copy link
Collaborator Author

minsii commented Jan 29, 2019

Slides for 2019-01-29's thread WG meeting:
SHMEM_Interoperability_20190129.pptx
main_spec-interoperability-20190129.pdf

@jeffhammond
Copy link

screen shot 2019-01-31 at 9 55 47 am

In theory, OSHMPI can intercept MPI init/finalize and make this work, but I prefer to just document that this usage model is not supported and tell people who try to do it that they are bad 😄

@minsii minsii self-assigned this Feb 11, 2019
@minsii
Copy link
Collaborator Author

minsii commented Feb 11, 2019

@jeffhammond Agreed that we must list it as a limitation of OSHMPI. But eventually this can be fixed by fixing the MPI spec :-)

@minsii
Copy link
Collaborator Author

minsii commented Feb 11, 2019

As discussed at the previous threads WG meeting (see threads-1-29-2019), the topic about supporting multiple init/finalize calls within a program has been separated into ticket #263

@minsii
Copy link
Collaborator Author

minsii commented Mar 19, 2019

Attached draft of spec change for 2019-03-19 thread WG meeting:
openshmem-spec-interoperability-20190319.pdf

Also see git diff at minsii#1

@jdinan
Copy link
Collaborator

jdinan commented Jan 31, 2020

@minsii Is this issue resolved? If it is only partially resolved, it might be helpful to create one or more issues for additional work to be done on interoperability (and possibly assign to the OpenSHMEM 1.6 milestone).

@minsii
Copy link
Collaborator Author

minsii commented Feb 4, 2020

@jdinan Yes, it is resolved. The remaining issue has been moved to a separate issue #263. I will assign #263 to the 1.6 milestone.

@minsii minsii closed this as completed Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants