diff --git a/content/execution_model.tex b/content/execution_model.tex index a1ea1a69..0faa54e4 100644 --- a/content/execution_model.tex +++ b/content/execution_model.tex @@ -32,13 +32,20 @@ \subsection{Progress of OpenSHMEM Operations}\label{subsec:progress} The \openshmem model assumes that computation and communication are naturally overlapped. \openshmem programs are expected to exhibit progression of -communication both with and without \openshmem calls. Consider a \ac{PE} that is +communication both with and without \openshmem calls. For point-to-point +operations, consider a \ac{PE} that is engaged in a computation with no \openshmem calls. Other \acp{PE} should be able to communicate (e.g., \OPR{put}, \OPR{get}, \OPR{atomic}, etc.) and complete communication operations with that computationally-bound \ac{PE} without that \ac{PE} issuing any explicit \openshmem calls. One-sided \openshmem communication calls involving that \ac{PE} should progress regardless of when -that \ac{PE} next engages in an \openshmem call. +that \ac{PE} next engages in an \openshmem call. Similarly, +for non-blocking collectives, consider the \acp{PE} that are part of a team +issuing a non-blocking collective and overlapping collective completion with +computation. Once a non-blocking collective operation is initiated by +all of the \acp{PE} in the team of the collective, any \ac{PE} in the team must +eventually observe completion through a call to \FUNC{shmem\_req\_test} or a +call to \FUNC{shmem\_req\_wait}. \parimpnotes{ An \openshmem implementation for hardware that does not provide diff --git a/content/library_constants.tex b/content/library_constants.tex index 0a0194de..66d71725 100644 --- a/content/library_constants.tex +++ b/content/library_constants.tex @@ -84,6 +84,15 @@ See Section~\ref{subsec:shmem_ctx_create} for more detail about its use. \tabularnewline \hline %% +\LibConstDecl{SHMEM\_REQ\_INVALID} & +A value corresponding to an invalid request handle. +This value can be used to initialize or update request handles to indicate +that they do not reference a valid request. +When managed in this way, applications can use an equality comparison +to test whether a given request handle references a valid request. +See Section~\ref{subsec:nb_coll} for more detail about its use. +\tabularnewline \hline +%% \LibConstDecl{SHMEM\_SIGNAL\_SET} & An integer constant expression corresponding to the signal update set operation. See Section~\ref{subsec:shmem_put_signal} and diff --git a/content/nb_collectives_intro.tex b/content/nb_collectives_intro.tex new file mode 100644 index 00000000..6ff5db01 --- /dev/null +++ b/content/nb_collectives_intro.tex @@ -0,0 +1,29 @@ +An \openshmem nonblocking collective operation, like a blocking collective +operation, is a group communication operation among the +participants of the team. All \acp{PE} in the team are required to call the +collective operation and each collective operation must be initiated in the same +order across all \acp{PE} while the execution may be performed in any order. + +\begin{enumerate} + +\item Invocation semantics: Upon invocation of a nonblocking collective routine, +the operation is initiated and the routine returns without ensuring completion. All \acp{PE} in the team +must call this routine with identical arguments. + +\item Collective Types: The nonblocking variants supported include the alltoall +and broadcast collectives. All other collective operations such as +reductions, collect, fcollect, barrier, barrier all, alltoalls, sync, and sync all will not have nonblocking variants. + +\item Completion semantics: \openshmem programs can learn the status of the collective operations +using the \FUNC{shmem\_req\_test} routine. The operation is completed after +a call to \FUNC{shmem\_req\_test} or a call to \FUNC{shmem\_req\_wait}. + +\item Threads: While using SHMEM\_THREAD\_MULTIPLE, the \openshmem +programs are not allowed to call multiple collective operations on different threads +and the same team. + +\end{enumerate} + +Note: Like other nonblocking \openshmem operations, the implementations are +expected to asynchronously progress the collective operations. The guidance on +asynchronous progress is provided in Section \ref{subsec:progress}. diff --git a/content/shmem_alltoall_nb.tex b/content/shmem_alltoall_nb.tex new file mode 100644 index 00000000..7d772baf --- /dev/null +++ b/content/shmem_alltoall_nb.tex @@ -0,0 +1,123 @@ +\apisummary{ + Exchanges a fixed amount of contiguous data blocks between all pairs + of \acp{PE} participating in the collective routine. +} + +\begin{apidefinition} + +%% C11 +\begin{C11synopsis} +int @\FuncDecl{shmem\_alltoall\_nb}@(shmem_team_t team, TYPE *dest, const TYPE +*source, size_t nelems, shmem_req_h *request); +\end{C11synopsis} +where \TYPE{} is one of the standard \ac{RMA} types specified by Table \ref{stdrmatypes}. + +\begin{Csynopsis} +\end{Csynopsis} +\begin{CsynopsisCol} +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_alltoall\_nb}@(shmem_team_t team, +TYPE *dest, const TYPE *source, size_t nelems, shmem_req_h *request); +\end{CsynopsisCol} +where \TYPE{} is one of the standard \ac{RMA} types and has a corresponding \TYPENAME{} specified by Table \ref{stdrmatypes}. + +\begin{CsynopsisCol} +int @\FuncDecl{shmem\_alltoallmem\_nb}@(shmem_team_t team, void *dest, const +void *source, size_t nelems, shmem_req_h *request); +\end{CsynopsisCol} + +\begin{apiarguments} + +\apiargument{IN}{team}{A valid \openshmem team handle to a team.}% + +\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive + the combined total of \VAR{nelems} elements from each \ac{PE} in the + team. + The type of \dest{} should match that implied in the SYNOPSIS section.} +\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems} + elements of data for each \ac{PE} in the team, ordered according to + destination \ac{PE}. + The type of \source{} should match that implied in the SYNOPSIS section.} +\apiargument{IN}{nelems}{ + The number of elements to exchange for each \ac{PE}. + For \FUNC{shmem\_alltoallmem\_nb} it represents bytes. +} +\apiargument{OUT}{request}{An opaque request handle identifying the collective +operation.} + +\end{apiarguments} + +\apidescription{ + The \FUNC{shmem\_alltoall\_nb} routines are collective routines. All + \acp{PE} in the provided team must participate in the collective. If + \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is + otherwise invalid, the behavior is undefined. + + {\bf Invocation and completion}: A call to the nonblocking alltoall routine initiates the operation and returns + immediately without necessarily completing the operation. On success, + an opaque request handle is created and returned. The + operation is completed after a call to \FUNC{shmem\_req\_test} or + a call to \FUNC{shmem\_req\_wait}. When the operation is complete, the request handle + is deallocated and cannot be reused. + + Though nonblocking alltoall varies in invocation and completion semantics + when compared to blocking alltoall, the data exchange semantics are similar. + + {\bf Data exchange semantics}: + In this routine, each \ac{PE} + participating in the operation exchanges \VAR{nelems} data elements + with all other \acp{PE} participating in the operation. + The size of a data element is: + \begin{itemize} + \item 8 bits for \FUNC{shmem\_alltoallmem\_nb} + \item \FUNC{sizeof}(\TYPE{}) for alltoall routines taking typed \VAR{source} and \VAR{dest} + \end{itemize} + + The data being sent and received are + stored in a contiguous symmetric data object. The total size of each \ac{PE}'s + \VAR{source} object and \VAR{dest} object is \VAR{nelems} times the size of + an element + times \VAR{N}, where \VAR{N} equals the number of \acp{PE} participating + in the operation. + The \VAR{source} object contains \VAR{N} blocks of data + (where the size of each block is defined by \VAR{nelems}) and each block of data + is sent to a different \ac{PE}. + + The same \dest{} and \source{} + arrays, and same value for nelems + must be passed by all \acp{PE} that participate in the collective. + + Given a \ac{PE} \VAR{i} that is the \kth \ac{PE} + participating in the operation and a \ac{PE} + \VAR{j} that is the \lth \ac{PE} + participating in the operation, + + \ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to + the \kth block of + the \VAR{dest} object of \ac{PE} \VAR{j}. + + + Like data exchange semantics, the entry and completion + criteria of blocking and nonblocking alltoall are similar. + + {\bf Entry criteria}: Before any \ac{PE} calls a \FUNC{shmem\_alltoall\_nb} routine, + the following condition must be ensured: + \begin{itemize} + \item The \VAR{dest} data object on all \acp{PE} in the team is + ready to accept the \FUNC{shmem\_alltoall\_nb} data. + \end{itemize} + Otherwise, the behavior is undefined. + + {\bf Completion criteria}: Upon completion, the following is true for + the local PE: + \begin{itemize} + \item Its \VAR{dest} symmetric data object is completely updated and + the data has been copied out of the \VAR{source} data object. + \end{itemize} +} + +\apireturnvalues{ + Zero on successful local completion. Nonzero otherwise. +} + +\end{apidefinition} + diff --git a/content/shmem_broadcast_nb.tex b/content/shmem_broadcast_nb.tex new file mode 100644 index 00000000..ce2bc2bb --- /dev/null +++ b/content/shmem_broadcast_nb.tex @@ -0,0 +1,103 @@ +\apisummary{ + Broadcasts a block of data from one \ac{PE} to one or more destination + \acp{PE}. +} + +\begin{apidefinition} + +%% C11 +\begin{C11synopsis} +int @\FuncDecl{shmem\_broadcast\_nb}@(shmem_team_t team, TYPE *dest, const TYPE +*source, size_t nelems, int PE_root, shmem_req_h *request); +\end{C11synopsis} +where \TYPE{} is one of the standard \ac{RMA} types specified by Table \ref{stdrmatypes}. + +%% C/C++ +\begin{Csynopsis} +\end{Csynopsis} +\begin{CsynopsisCol} +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_broadcast\_nb}@(shmem_team_t team, TYPE +*dest, const TYPE *source, size_t nelems, int PE_root, shmem_req_h *request); +\end{CsynopsisCol} +where \TYPE{} is one of the standard \ac{RMA} types and has a corresponding \TYPENAME{} specified by Table \ref{stdrmatypes}. + +\begin{CsynopsisCol} +int @\FuncDecl{shmem\_broadcastmem\_nb}@(shmem_team_t team, void *dest, const void +*source, size_t nelems, int PE_root, shmem_req_h *request); +\end{CsynopsisCol} + +\begin{apiarguments} + +\apiargument{IN}{team}{The team over which to perform the operation.}% + +\apiargument{OUT}{dest}{Symmetric address of destination data object. + The type of \dest{} should match that implied in the SYNOPSIS section.} +\apiargument{IN}{source}{Symmetric address of the source data object. + The type of \source{} should match that implied in the SYNOPSIS section.} +\apiargument{IN}{nelems}{ + The number of elements in \source{} and \dest{} arrays. + For \FUNC{shmem\_broadcastmem\_nb}, elements are bytes. +} +\apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to + the team, from which the data is copied.} +\apiargument{OUT}{request}{An opaque request handle identifying the collective +operation.} + + +\end{apiarguments} + +\apidescription{ + \openshmem nonblocking broadcast routines are collective routines over a + valid \openshmem team. + They copy the \source{} data object on the \ac{PE} specified by + \VAR{PE\_root} to the \dest{} data object on the \acp{PE} + participating in the collective operation. + The same \dest{} and \source{} data objects and the same value of + \VAR{PE\_root} must be passed by all \acp{PE} participating in the + collective operation. + + A call to the nonblocking broadcast routine initiates the operation and returns + immediately without necessarily completing the operation. On success, + an opaque request handle is created and returned. The + operation is completed after a call to \FUNC{shmem\_req\_test} or a + call to \FUNC{shmem\_req\_wait}. When the operation is complete, the request handle + is deallocated and cannot be reused. + + Like blocking broadcast, before any \ac{PE} calls a broadcast routine, the following + conditions must be ensured: + \begin{itemize} + \item The \dest{} array on all \acp{PE} participating in the broadcast + is ready to accept the broadcast data. + \item All \acp{PE} in the \VAR{team} argument must participate in + the operation. + \item If the \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is + otherwise invalid, the behavior is undefined. + \item \ac{PE} numbering is relative to the team. The specified + root \ac{PE} must be a valid \ac{PE} number for the team, + between \CONST{0} and \VAR{N$-$1}, where \VAR{N} is the size of + the team. + \end{itemize} + Otherwise, the behavior is undefined. + + Upon completion of a nonblocking broadcast routine, the following are true for the local + \ac{PE}: + \begin{itemize} + \item The \dest{} data object is updated. + \item If the local \ac{PE} is \VAR{PE\_root}, the data has been copied + out of the \source{} data object. + \end{itemize} +} + + +\apireturnvalues{ + Zero on success and nonzero otherwise. +} + +\apinotes{ + Team handle error checking and integer return codes are currently undefined. + Implementations may define these behaviors as needed, but programs should + ensure portability by doing their own checks for invalid team handles and for + \LibConstRef{SHMEM\_TEAM\_INVALID}. +} + +\end{apidefinition} diff --git a/content/shmem_collective_test.tex b/content/shmem_collective_test.tex new file mode 100644 index 00000000..e4b4851f --- /dev/null +++ b/content/shmem_collective_test.tex @@ -0,0 +1,35 @@ +\apisummary{ + The routine outputs the status of the operation identified by the request. +} + +\begin{apidefinition} + +\begin{Csynopsis} +int @\FuncDecl{shmem\_req\_test}@(shmem_req_h *request); +\end{Csynopsis} + +\begin{apiarguments} + + \apiargument{IN}{request}{Request handle} + +\end{apiarguments} + +\apidescription{ + A call to \FUNC{shmem\_req\_test} returns immediately. If the + operation identified by the request is completed, it returns + zero, and the request object is deallocated and set to \LibConstRef{SHMEM\_REQ\_INVALID}. + If the operation is not completed, it returns a non-negative integer. + If the request object is not valid (i.e., it is set to \LibConstRef{SHMEM\_REQ\_INVALID}), + no operation is performed and a negative value is returned. + + In a multithreaded environment, \FUNC{shmem\_req\_test} can be called by + different threads but on different request objects. It is the responsibility + of the \openshmem user to ensure that proper synchronization is used to + prevent race conditions or deadlock. + } + +\apireturnvalues{ + On success returns zero, otherwise returns a nonzero integer. + } + +\end{apidefinition} diff --git a/content/shmem_collective_wait.tex b/content/shmem_collective_wait.tex new file mode 100644 index 00000000..a1c44a6c --- /dev/null +++ b/content/shmem_collective_wait.tex @@ -0,0 +1,38 @@ +\apisummary{ + The routine waits until a operation identified by a request + object completes. +} + +\begin{apidefinition} + +\begin{Csynopsis} +int @\FuncDecl{shmem\_req\_wait}@(shmem_req_h *request); +\end{Csynopsis} + +\begin{apiarguments} + + \apiargument{IN}{request}{Request handle} + +\end{apiarguments} + +\apidescription{ + +The \FUNC{shmem\_req\_wait} function is a blocking operation used to +determine whether an operation identified by the request object has +been completed. When the operation is completed, \FUNC{shmem\_req\_wait} returns +zero, and the request object is deallocated and set to \LibConstRef{SHMEM\_REQ\_INVALID}. +If the request object is not valid (i.e., it is set to +\LibConstRef{SHMEM\_REQ\_INVALID}), no operation is performed and a negative +value is returned. + +In a multithreaded environment, \FUNC{shmem\_req\_wait} can be called by different +threads but on different request objects. It is the responsibility of the +\openshmem user to ensure that proper synchronization is used to prevent race +conditions or deadlock. + } + +\apireturnvalues{ + On success returns zero, otherwise returns a negative integer. + } + +\end{apidefinition} diff --git a/main_spec.tex b/main_spec.tex index 19b7200f..e11af1bb 100644 --- a/main_spec.tex +++ b/main_spec.tex @@ -383,6 +383,23 @@ \subsubsection{\textbf{SHMEM\_COLLECT, SHMEM\_FCOLLECT}}\label{subsec:shmem_coll \subsubsection{\textbf{SHMEM\_REDUCTIONS}}\label{subsec:shmem_reductions} \input{content/shmem_reductions.tex} +\newpage +\subsection{Nonblocking Collective Routines}\label{subsec:nb_coll} +\input{content/nb_collectives_intro.tex} + +\subsubsection{\textbf{SHMEM\_BROADCAST\_NB}}\label{subsec:shmem_broadcast_nb} +\input{content/shmem_broadcast_nb.tex} + +\subsubsection{\textbf{SHMEM\_ALLTOALL\_NB}}\label{subsec:shmem_alltoall_nb} +\input{content/shmem_alltoall_nb.tex} + +\subsubsection{\textbf{SHMEM\_REQ\_TEST}}\label{subsec:shmem_collective_test} +\input{content/shmem_collective_test.tex} + +\subsubsection{\textbf{SHMEM\_REQ\_WAIT}}\label{subsec:shmem_collective_wait} +\input{content/shmem_collective_wait.tex} + +