-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need for local completion and remote commit #468
Comments
static long target;
long base;
shmem_atomic_fetch_add_nbi(ctx, &base, &target, value, target_pe);
shmem_ctx_local_complete(ctx);
// The 'base' object has been updated on the calling PE.
shmem_ctx_remote_commit(ctx);
// The update to 'target' is now visible in memory on the target PE. |
Some examples to clarify the local complete and remote commit semantics: 1. shmem_put_nbi
2. shmem_remote_commit // remote commit is a no-op here - local completion of previous put is not provided 1. shmem_put_nbi
2. shmem_local_complete
3. shmem_remote_commit // remote commit guarantees global visibility of target buffer from step(1) 1. shmem_put
2. shmem_remote_commit // remote commit guarantees global visibility of target buffer from step(1)
// because, implicit local completion is available as part of blocking put operation 1: shmem_put_nbi
2: shmem_local_complete
3. shmem_put
4. shmem_remote_commit // target buffers from step(1) and (3) are made globally visible
// because, implicit local completion for blocking put in step(3) and explicit local
// completion in step(2) for nbi put operation in step(1) are available 1: shmem_put_nbi
2. shmem_put
3. shmem_remote_commit // target buffer only from step(2) is globally visible and not from step(1)
// implicit local complete semantics in blocking put does not guarantee local completion
// from other operations 1. shmem_get_nbi
2. shmem_local_complete // guarantees the availability of received value with return from local complete
// local completion of the get operation guarantees the actual completion of operation // Nick's example
1. shmem_atomic_fetch_add
2. shmem_local_complete // fetched value is made available on returning from local complete
// but global visibility of target buffer from the AMO is not guaranteed
3. shmem_remote_commit // global visibility of target buffer from the AMO is guaranteed |
FYI - From implementation perspective, this requires remote completion and it will have a latency of remote completion. |
@manjugv Does that mean - every FAMO in your implementation provides global visibility guarantees? If so, aren't you providing more guarantees than what OSM-1.5 expects? AFAIU, a local completion operation is not used to create delayed execution. That is for the shmem_session to handle. It just provides a way for delayed remote completion. Meaning, you can try to implement all NBI and blocking operation by maintaining a local staging buffer. But, you would need to definitely need to post all these operations from the local staging buffer into the NIC during local_complete and make sure it has reached a state in the NIC, where it is safe from retransmission request. |
I was thinking about this proposal today; in particular, how it seems to give rise to a set of "equivalences:"
On one hand, I think that thinking about how existing OpenSHMEM operations can be translated into equivalent forms could be helpful. On the other hand, I think the |
Separately, I'm a little nervous that we're adding complexity here that may be hard to reconcile with any eventual memory model. I think we had a reasonably clear mapping of AMOs and fence/quiet to the C++ memory model. I feel less confident about the mapping in terms of |
RDMA flush proposal: https://tools.ietf.org/id/draft-talpey-rdma-commit-01.html#rfc.section.3.1.1 |
On today's call, it seemed like:
While I understand @naveen-rn's rationale for all three new APIs, I wonder whether this issue—in particular, the need for an efficient successor to It seems to me (perhaps naively) that this issue could really be two mostly independent features: |
Separately, there was a lot of discussion about completion semantics and how they're implemented. As an application user, I feel like libfabric has reasonably understandable language regarding completion semantics. (See "Completion Event Semantics" under
Likely someone can correct me, but it doesn't seem like libfabric has anything quite analogous to |
The status of this PR as of June, 25 - before the Spec Meeting:
|
Motivation
In general, implementing
shmem_quiet
based memory ordering semantics is expensive. With the introduction of system processors with weak memory model, and support for multiple NICs per node, the cost of performing remote completion and committing any previously posted RMA and AMO events is getting really expensive. This introduces the need for performing dummy read-like operations to commit any outstanding operations into the remote targets memory.Solution
As part of this proposal, we would like to introduce explicit options to perform local completion in OpenSHMEM. To complete the API we also would like to introduce the option to explicitly perform the remote commit operation. We can implement the existing
shmem_quiet
semantics as a combination of the local completion and remote commit operation.Proposed API
The following new routines are proposed:
API Semantics
shmem_local_complete
andshmem_ctx_local_complete
The
shmem_local_complete
routine ensures the local completion of all operations on symmetric data objects issued by the calling PE on a given context. By local completion, theshmem_local_complete
routine ensures the completion of all previously posted operations on symmetric data objects, but it does not guarantee any visibility of those operations when it returns fromshmem_local_complete
. With the local completion the symmetric data objects from all previously posted operations are ready to be reusable for performing other operations.shmem_remote_commit
andshmem_ctx_remote_commit
The
shmem_remote_visible
routine ensures the global visibility of all previously locally completed operations. It is to be noted that, this routine ensure only global visibility of only the previously locally completed operation. The local completion can be attained implicitly through the OpenSHMEM routines (like blocking put and AMO) or explicitly calling theshmem_local_complete
operations.shmem_team_remote_commit
This is a collective variant of the
shmem_remote_commit
operation. This routine registers the arrival of a PE at ashmem_team_remote_commit
operation and blocks the PE until all other PEs arrive at the sameshmem_team_remote_commit
operation and also ensures that any locally completed operation on all PEs are made globally visibleThe text was updated successfully, but these errors were encountered: