diff --git a/content/atomics_intro.tex b/content/atomics_intro.tex index d803887d..7664d635 100644 --- a/content/atomics_intro.tex +++ b/content/atomics_intro.tex @@ -29,11 +29,18 @@ The non-fetching routines include: \FUNC{shmem\_atomic\_\{set, inc, add, and, or, xor\}[\_nbi]}. +\begin{DeprecateBlock} + +Starting in \openshmem[1.4], all \ac{AMO} functions added "\_atomic\_" to the function +name and deprecated the equivalent functions without "\_atomic\_" in the name. + +\end{DeprecateBlock} + \end{itemize} \openshmem \ac{AMO} routines specified in this section have two variants. In one of the variants, the context handle, \VAR{ctx}, is explicitly passed as -an argument. In this variant, the operation is performed on the specified +an argument. In this variant, the operation is performed on the specified context. If the context handle \VAR{ctx} does not correspond to a valid context, the behavior is undefined. In the other variant, the context handle is not explicitly passed and thus, the operations are performed on the @@ -56,7 +63,7 @@ integer types defined in \HEADER{stdint.h} by \Cstd[99]~\S7.18.1.1 and \Cstd[11]~\S7.20.1.1. When the \Cstd translation environment does not provide exact-width integer types with \HEADER{stdint.h}, an -\openshmem implemementation is not required to provide support for these types. +\openshmem implementation is not required to provide support for these types. \begin{table}[h] \begin{center} @@ -123,3 +130,10 @@ \label{bitamotypes} \end{center} \end{table} +] + + + + + + diff --git a/content/backmatter.tex b/content/backmatter.tex index 16e7ffcc..bc7013ff 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -151,9 +151,9 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined} \tabularnewline \hline Use of non-symmetric variables & Some routines require remotely accessible -variables to perform their function. For example, a \PUT{} to a non-symmetric variable may -be trapped where possible and the library may abort the program. Another -implementation may choose to continue execution with or without a warning. +variables to perform their function. For example, an \openshmem libray may detect a \PUT{} to a non-symmetric variable +and choose to abort the program. +However, another implementation may choose to continue execution with or without a warning. \tabularnewline \hline Non-symmetric allocation of symmetric memory & The symmetric memory management routines are @@ -648,12 +648,17 @@ \subsection{Table~\ref{p2psynctypes}: point-to-point synchronization types} \chapter{Changes to this Document}\label{sec:changelog} \section{Version 1.6} +\label{changelog:v1.6} Major changes in \openshmem[1.6] include the addition of the new \FUNC{shmem\_team\_ptr}, \FUNC{shmem\_ibget}, and \FUNC{shmem\_ibput} functions. The following list describes the specific changes in \openshmem[1.6]: -\begin{itemize} +\begin{enumerate} +% +\item Added an inclusive (\FUNC{shmem\_sum\_inscan}) and exclusive +(\FUNC{shmem\_sum\_exscan}) collective summation operation. +\ChangelogRef{subsec:shmem_scan} % \item Added support for initialization and finalization routines to be called multiple times, and added an initialization status query API @@ -668,23 +673,14 @@ \section{Version 1.6} update a remote flag without associated data transfer of a put-with-signal operation. \ChangelogRef{subsec:shmem_signal_add, subsec:shmem_signal_set}% % -\item Clarified that \OPR{Fence} operations only guarantee ordering for - operations that are performed on the same context. -\ChangelogRef{subsec:shmem_fence}% -% \item Added a team-based pointer query routine: \FUNC{shmem\_team\_ptr}. \ChangelogRef{subsec:shmem_team_ptr}% % -\item Clarified that \FUNC{shmem\_team\_split\_strided} and - \FUNC{shmem\_team\_split\_strided} return a nonzero value when the parent - team compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID}. -\ChangelogRef{subsec:shmem_team_split_strided, subsec:shmem_team_split_2d}% -% -\item Removed \openshmem[1.5] Table 9, which was an incomplete duplicate of - \openshmem[1.5] Table 10, and clarified the types, names, and supporting - operations for team-based reductions. -\ChangelogRef{teamreducetypes}% +\item Clarified that the behavior of \FUNC{shmem\_team\_split\_strided} is + undefined when the input \VAR{start}, \VAR{stride}, and \VAR{size} arguments + imply a \textit{wrap-around} with respect to the parent team's \acp{PE}. +\ChangelogRef{subsec:shmem_team_split_strided}% % \item Added the session routines, \FUNC{shmem\_ctx\_session\_start} and \FUNC{shmem\_ctx\_session\_stop}, which allow users to pass hints to the @@ -703,11 +699,6 @@ \section{Version 1.6} the world team. \ChangelogRef{subsec:shmem_malloc, subsec:shmem_free, subsec:shmem_realloc, subsec:shmem_align, subsec:shmmallochint, subsec:shmem_calloc}% -\item Corrected the level argument's recommended value in API notes for - \FUNC{shmem\_pcontrol} to indicate that the value should be greater than - 2 to enable profiling with profile library defined effects and - additional arguments. -\ChangelogRef{subsec:shmem_pcontrol} % \item Clarified that \FUNC{shmem\_team\_get\_config} returns the current configuration values, which may differ from the values assigned at the @@ -722,7 +713,44 @@ \section{Version 1.6} stride argument is 0 or negative. \ChangelogRef{subsec:shmem_team_split_strided} % -\end{itemize} +\item Clarified the requirements for the source buffer before entering the + collective routines. +\ChangelogRef{subsec:shmem_alltoall,subsec:shmem_broadcast,subsec:shmem_collect,subsec:shmem_reductions,subsec:shmem_scan} +% +\item Added a new Errata Section~\ref{sec:errata} that indicates errors or ambiguities in the + \openshmem specification and the version that required correction or clarification. +\ChangelogRef{sec:errata} +% +\item Removed \openshmem[1.5] Table 9, which was an incomplete duplicate of + \openshmem[1.5] Table 10, and clarified the types, names, and supporting + operations for team-based reductions. \label{changelog:reduction_table} +\ChangelogRef{teamreducetypes}% +% +\item Clarified that \VAR{source} and \VAR{dest} arrays must be the same + across \acp{PE} in \openshmem reductions \label{changelog:reduction_args} +\ChangelogRef{subsec:shmem_reductions} +% +\item Clarified that \OPR{Fence} operations only guarantee ordering for + operations that are performed on the same context. \label{changelog:fence_ctx} +\ChangelogRef{subsec:shmem_fence}% +% +\item Clarified that \FUNC{shmem\_test\_all} and \FUNC{shmem\_test\_all\_vector} + routines return 1 when the test set is empty. \label{changelog:test_all} +\ChangelogRef{subsec:shmem_test_all,subsec:shmem_test_all_vector}% +% +\item Clarified that \FUNC{shmem\_team\_split\_strided} and + \FUNC{shmem\_team\_split\_strided} return a nonzero value when the parent + team compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID}. \label{changelog:split_strided_2d} +\ChangelogRef{subsec:shmem_team_split_strided, subsec:shmem_team_split_2d}% +% +\item Corrected the level argument's recommended value in API notes for + \FUNC{shmem\_pcontrol} to indicate that the value should be greater than + 2 to enable profiling with profile library defined effects and + additional arguments. \label{changelog:pcontrol} +\ChangelogRef{subsec:shmem_pcontrol} +% + +\end{enumerate} \section{Version 1.5} Major changes in \openshmem[1.5] include the addition of new team-based @@ -732,7 +760,7 @@ \section{Version 1.5} interface, and the removal of the entire \Fortran \ac{API}. The following list describes the specific changes in \openshmem[1.5]: -\begin{itemize} +\begin{enumerate} % \item Removed \FUNC{SHMEM\_CACHE}. \ChangelogRef{dep:shmem_cache}% @@ -883,7 +911,7 @@ \section{Version 1.5} \item Clarified the atomicity guarantees of the \openshmem memory model. \ChangelogRef{subsec:amo_guarantees}% % -\end{itemize} +\end{enumerate} \section{Version 1.4} Major changes in \openshmem[1.4] include @@ -898,7 +926,7 @@ \section{Version 1.4} and \Cstd[11] type-generic interfaces for point-to-point synchronization. The following list describes the specific changes in \openshmem[1.4]: -\begin{itemize} +\begin{enumerate} % \item New communication management \ac{API}, including \FUNC{shmem\_ctx\_create}; \FUNC{shmem\_ctx\_destroy}; and additional \ac{RMA}, \ac{AMO}, and memory ordering @@ -993,7 +1021,7 @@ \section{Version 1.4} % \item Expanded the type support for \ac{RMA}, \ac{AMO}, and point-to-point synchronization operations. -%% cleveref will compress a list of references by default. It is better to not +%% cleveref will compress a list of references by default. It is better to not %% compress this list of *table* references because the clickable hyperref %% links are useful. You can tell cleveref to not compress the LHS and RHS by %% inserting an empty item between them; i.e., `,,`. @@ -1018,7 +1046,7 @@ \section{Version 1.4} \item Clarified that complex-typed reductions in C are optionally supported. \ChangelogRef{subsec:shmem_reductions}% % -\end{itemize} +\end{enumerate} @@ -1031,7 +1059,7 @@ \section{Version 1.3} and \Cstd[11] type-generic interfaces for \ac{RMA} and \ac{AMO} operations. The following list describes the specific changes in \openshmem[1.3]: -\begin{itemize} +\begin{enumerate} % \item Clarified implementation of \acp{PE} as threads. % @@ -1072,7 +1100,7 @@ \section{Version 1.3} \item Deprecation of \FUNC{SHMEM\_CACHE}. \ChangelogRef{dep:shmem_cache}% % -\end{itemize} +\end{enumerate} @@ -1087,7 +1115,7 @@ \section{Version 1.2} and clarifications to several \ac{API} descriptions. The following list describes the specific changes in \openshmem[1.2]: -\begin{itemize} +\begin{enumerate} % \item Added specification of \VAR{pSync} initialization for all routines that use it. % @@ -1143,7 +1171,7 @@ \section{Version 1.2} support across versions of the \openshmem Specification. \ChangelogRef{sec:dep}% % -\end{itemize} +\end{enumerate} @@ -1157,7 +1185,7 @@ \section{Version 1.1} and general readabilty and usability improvements to the document structure. The following list describes the specific changes in \openshmem[1.1]: -\begin{itemize} +\begin{enumerate} % \item Clarifications of the completion semantics of memory synchronization interfaces. @@ -1266,6 +1294,47 @@ \section{Version 1.1} \item Name changes for UV and ICE for \ac{SGI} systems. \ChangelogRef{sec:openshmem_history}% % -\end{itemize} +\end{enumerate} + +\chapter{Errata}\label{sec:errata} + +Errors or ambiguities in the \openshmem specification may be discovered after +publication. +Errata, or corrections, are included in the the sections below indicating the +version of the OpenSHMEM specification that required the correction or +clarification. +These corrections have been applied to all subsequent versions of the +specification and this section serves as a historical record of the changes +made to assist users and implementers with applying the necessary corrections. +Errata that result in a change to the specifciation are also included in +Annex~\ref{sec:changelog}. +For an implementation to comply with a particular version of \openshmem, it +must account for all errata associated with that version as indicated below. + +\section{Version 1.5} + +\begin{enumerate} + \item Removed \openshmem[1.5] Table 9, which was an incomplete duplicate of + \openshmem[1.5] Table 10, and clarified the types, names, and supporting + operations for team-based reductions + (\ref{changelog:v1.6}.\ref{changelog:reduction_table}). + \item Clarified that \VAR{source} and \VAR{dest} arrays must be the same + across \acp{PE} in \openshmem reductions + (\ref{changelog:v1.6}.\ref{changelog:reduction_args}). + \item Clarified that \OPR{Fence} operations only guarantee ordering for operations + that are performed on the same context + (\ref{changelog:v1.6}.\ref{changelog:fence_ctx}). + \item Clarified that \FUNC{shmem\_test\_all} and + \FUNC{shmem\_test\_all\_vector} routines return 1 when the test set is empty + (\ref{changelog:v1.6}.\ref{changelog:test_all}). + \item Clarified that \FUNC{shmem\_team\_split\_strided} and + \FUNC{shmem\_team\_split\_2d} return nonzero when the parent team is + \LibConstRef{SHMEM\_TEAM\_INVALID} + (\ref{changelog:v1.6}.\ref{changelog:split_strided_2d}). + \item Corrected the \VAR{level} argument's recommended value in API notes for + \FUNC{shmem\_pcontrol} to indicate that the value should be greater than 2 to enable + profiling with profile library defined effects and additional arguments + (\ref{changelog:v1.6}.\ref{changelog:pcontrol}). +\end{enumerate} %end of setlength command that was started in frontmatter.tex diff --git a/content/collective_intro.tex b/content/collective_intro.tex index 823164ab..4996b178 100644 --- a/content/collective_intro.tex +++ b/content/collective_intro.tex @@ -1,21 +1,24 @@ \emph{Collective routines} are defined as coordinated communication or synchronization operations performed by a group of \acp{PE}. -\openshmem provides three types of collective routines: +\openshmem provides four types of collective routines: \begin{enumerate} -\item Collective routines that operate on teams use a team handle parameter to determine - which \acp{PE} will participate in the routine, and use resources encapsulated by the team object - to perform operations. See Section~\ref{subsec:team} for details on team management. + \item Collective routines that operate on teams use a team handle parameter to determine + which \acp{PE} will participate in the routine, and use resources encapsulated by the team object + to perform operations. See Section~\ref{subsec:team} for details on team management. -\begin{DeprecateBlock} -\item Collective routines that operate on active sets use a set of parameters to determine - which \acp{PE} will participate and what resources are used to perform operations. -\end{DeprecateBlock} + \begin{DeprecateBlock} + \item Collective routines that operate on active sets use a set of parameters to determine + which \acp{PE} will participate and what resources are used to perform operations. + + \item Collective routines that do not accept active set + parameters and, as required, the default context. + \end{DeprecateBlock} -\item Collective routines that accept neither team nor active set - parameters, which implicitly operate on the world team and, as - required, the default context. + \item Collective routines that do not accept team + parameters, which implicitly operate on the world team and, as + required, the default context. \end{enumerate} Concurrent accesses to symmetric memory by an \openshmem collective diff --git a/content/coverpage.tex b/content/coverpage.tex index a5b3df31..e26a54ca 100644 --- a/content/coverpage.tex +++ b/content/coverpage.tex @@ -47,8 +47,60 @@ \section*{Sponsored by} \end{itemize} \section*{Authors and Collaborators} -This document is a collaborative effort consisting of several releases of \openshmem versions 1.0 through 1.5. This section lists the authors and contributors in reverse chronological order, starting with \openshmem 1.5. +This document is a collaborative effort consisting of several releases of \openshmem versions 1.0 through 1.6. This section lists the authors and contributors in reverse chronological order, starting with \openshmem 1.6. +\subsection*{\openshmem 1.6} +\begin{multicols}{2} +\begin{itemize} +\setlength\itemsep{0.1em} +\item Ferrol Aderholdt, NVIDIA +\item Muhammad Awad, \ac{AMD} +\item Matthew Baker, \ac{ORNL} +\item Swen Boehm, \ac{ORNL} +\item Aurelien Bouteiller, \ac{UTK} +\item Mark Brown, Intel +\item Bob Cernohous, \ac{HPE} +\item James Dinan\footnotemark[1], NVIDIA +\item Megan Grodowitz, Arm Inc. +\item Max Grossman, Georgia Tech +\item Yanfei Guo, \ac{ANL} +\item Khaled Hamidouche, NVIDIA +\item Jeff Hammond, NVIDIA +\item Akihiro Hayashi, Georgia Tech +\item Oscar Hernandez, \ac{ORNL} +\item Kieran Holland, Intel +\item Robert Kierski, \ac{HPE} +\item Bryant Lam, \ac{DoD} +\item Akhil Langer, NVIDIA +\item Tiffany M. Mintz, \ac{ORNL} +\item Bryan Morgan, Intel +\item William Okuno\footnotemark[2], \ac{HPE} +\item David Ozog\footnotemark[5], Intel +\item Nicholas Park, \ac{DoD} +\item Wendy Poole, \ac{LANL} +\item Steve Poole\footnotemark[6], \ac{OSSS} +\item Swaroop Pophale, \ac{ORNL} +\item Sreeram Potluri, NVIDIA +\item Brandon Potter\footnotemark[4], \ac{AMD} +\item Howard Pritchard, \ac{LANL} +\item Md. Wasi-ur- Rahman\footnotemark[11], Intel +\item Naveen Ravichandrasekaran\footnotemark[9], \ac{HPE} +\item Michael Raymond, \ac{HPE} +\item Elliot Ronaghan\footnotemark[8], \ac{HPE} +\item James Ross, \ac{ARL} +\item Pavel Shamis, NVIDIA +\item Sameer Shende, \ac{UO} +\item Danielle Sikich, \ac{HPE} +\item Brian Smith, Cornelis Networks +\item Lawrence Stewart\footnotemark[7], Intel +\item Zach Tiffany, NVIDIA +\item Manjunath Gorentla Venkata\footnotemark[10], NVIDIA +\item Kevin Waters\footnotemark[3], \ac{DoD} +\item Aaron Welch, \ac{ORNL} +\item Nathan Wichmann, \ac{HPE} +\item Jeffrey Young, Georgia Tech +\end{itemize} +\end{multicols} \subsection*{\openshmem 1.5} \begin{multicols}{2} diff --git a/content/execution_model.tex b/content/execution_model.tex index a56f8bda..e001680b 100644 --- a/content/execution_model.tex +++ b/content/execution_model.tex @@ -10,7 +10,7 @@ communicate and synchronize among executing \acp{PE}. The \openshmem phase in a program begins with the first call to the initialization routine \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}, which must be performed before using any of the -other \openshmem library routines. +other \openshmem library routines. An \openshmem program concludes its use of the \openshmem library when all \acp{PE} make their final call to \FUNC{shmem\_finalize} or any \ac{PE} calls \FUNC{shmem\_global\_exit}. diff --git a/content/frontmatter.tex b/content/frontmatter.tex index cf5a6ca3..43b5cf35 100644 --- a/content/frontmatter.tex +++ b/content/frontmatter.tex @@ -8,7 +8,7 @@ \SetWatermarkText{DRAFT} \SetWatermarkScale{1} \SetWatermarkLightness{.91} -\fancyfoot[C]{\thepage} %affects page numbering for the first pages, +\fancyfoot[C]{\thepage} %affects page numbering for the first pages, %except the first ToC page \pagenumbering{roman} %sets coverpage and toc page numbers to roman numerals @@ -19,10 +19,10 @@ \setcounter{secnumdepth}{4} \tableofcontents -\mainmatter % included for use of documenttype 'book' +\mainmatter % included for use of documenttype 'book' % Set header/footer for main content -\pagestyle{fancy} %replacing {headings} with {fancy} for customization +\pagestyle{fancy} %replacing {headings} with {fancy} for customization \fancyhf{} \fancyhead[L]{\leftmark} \fancyhead[R]{\thepage} diff --git a/content/interoperability.tex b/content/interoperability.tex index bb9ed5a1..7347ee87 100644 --- a/content/interoperability.tex +++ b/content/interoperability.tex @@ -119,7 +119,7 @@ \subsection{Mapping Process Identification Numbers} This feature, however, may be provided by only some of the \openshmem and \ac{MPI} implementations (e.g., if both environments share the same underlying process manager) and is not portably guaranteed. A portable program should always -use the standard functions in each model, namely, \FUNC{shmem\_team\_my\_pe} in \openshmem +use the standard functions in each model, namely, \FUNC{shmem\_team\_my\_pe} or \FUNC{shmem\_my\_pe} in \openshmem and \FUNC{MPI\_Comm\_rank} in \ac{MPI}, to query the process identification numbers in each communication environment and manage the mapping of identifiers in the program when necessary. diff --git a/content/library_constants.tex b/content/library_constants.tex index 0a0194de..b0edb9cf 100644 --- a/content/library_constants.tex +++ b/content/library_constants.tex @@ -84,6 +84,19 @@ See Section~\ref{subsec:shmem_ctx_create} for more detail about its use. \tabularnewline \hline %% +\LibConstDecl{SHMEM\_CTX\_SESSION\_TOTAL\_OPS} & +The bitwise flag which specifies that a session start routine should use the +\VAR{total\_ops} member of the provided \CTYPE{shmem\_ctx\_session\_config\_t} +configuration parameter as a hint. See \ref{subsec:shmem_ctx_session_config_t} +for more detail about its use. +\tabularnewline \hline +%% +\LibConstDecl{SHMEM\_CTX\_SESSION\_BATCH} & +The session start option which specifies that operations in the given session +are latency tolerant and may be candidates for batching. See +\ref{subsec:shmem_ctx_session_start} for more detail about its use. +\tabularnewline \hline +%% \LibConstDecl{SHMEM\_SIGNAL\_SET} & An integer constant expression corresponding to the signal update set operation. See Section~\ref{subsec:shmem_put_signal} and diff --git a/content/memmgmt_intro.tex b/content/memmgmt_intro.tex index 8cb6605c..660e5bc7 100644 --- a/content/memmgmt_intro.tex +++ b/content/memmgmt_intro.tex @@ -17,7 +17,7 @@ The total size of the symmetric heap is determined at job startup. One can specify the size of the heap using the \ENVVAR{SHMEM\_SYMMETRIC\_SIZE} environment -variable (where available). +variable (where available). \begin{DeprecateBlock} As of \openshmem[1.2] the use of \FUNC{shmalloc}, \FUNC{shmemalign}, diff --git a/content/memory_model.tex b/content/memory_model.tex index 20f46b37..58cf9777 100644 --- a/content/memory_model.tex +++ b/content/memory_model.tex @@ -71,7 +71,7 @@ \subsection{Pointers to Symmetric Objects}\label{subsec:pointers_to_symmetric_ob The ``mem'' interfaces (e.g., \FUNC{shmem\_putmem}) have no alignment requirements. -The \FUNC{shmem\_ptr} routine allows the programmer to query a {\em local +The \FUNC{shmem\_ptr} and \FUNC{shmem\_team\_ptr} routines allow the application to query a {\em local address} to a remotely accessible data object at a specified \ac{PE}. The resulting pointer is valid for direct memory access; however, providing this address as an argument of an \openshmem routine that requires a symmetric diff --git a/content/profiling_interface.tex b/content/profiling_interface.tex index e50ab8d7..9fec4875 100644 --- a/content/profiling_interface.tex +++ b/content/profiling_interface.tex @@ -1,74 +1,74 @@ -The objective of the \openshmem profiling interface is to ensure an -easy and flexible usage model for profiling (and other similar) -tool developers to interface their codes into \openshmem -implementations on different platforms. Since \openshmem is a -machine-independent standard with different implementations, it is -unreasonable to expect that the authors and developers of profiling -tools for \openshmem will have access to the source code that -implements \openshmem on any particular machine. It is, therefore, -necessary to provide a mechanism by which the implementors of such -tools can collect whatever performance information they wish +The objective of the \openshmem profiling interface is to ensure an +easy and flexible usage model for profiling (and other similar) +tool developers to interface their code into \openshmem +implementations on different platforms. Since \openshmem is a +machine-independent standard with different implementations, it is +unreasonable to expect that the authors and developers of profiling +tools for \openshmem will have access to the source code that +implements \openshmem on any particular machine. It is, therefore, +necessary to provide a mechanism by which the implementers of such +tools can collect whatever performance information they wish \emph{without} access to the underlying implementation. -The \openshmem profiling interface places the following requirements -on implementations. +The \openshmem profiling interface places the following requirements +on implementations. \begin{enumerate} -\item An \openshmem implementation must provide a mechanism through -which all of the \openshmem defined functions may be accessible -with a name shift. This requires an alternate -entry point name, with the prefix \FUNC{pshmem\_} for each -\openshmem function. For \openshmem inlined functions (e.g., macros), -it is also required that the \FUNC{pshmem\_} version is supplied -although it is not possible to replace the \FUNC{shmem\_} version +\item An \openshmem implementation must provide a mechanism through +which all of the \openshmem defined functions may be accessible +with a name shift. This requires an alternate +entry point name, with the prefix \FUNC{pshmem\_} for each +\openshmem function. For \openshmem inlined functions (e.g., macros), +it is also required that the \FUNC{pshmem\_} version is supplied +although it is not possible to replace the \FUNC{shmem\_} version with a user-defined version at link time. -\item It must be ensured that the \openshmem functions that are not -replaced as above, may still be linked into an executable image -without causing name clashes. -\item Documentation of the implementation of different language -bindings of the \openshmem interface must indicate if they -are layered on top of each other. Using this documentation, -developers can determine whether they need to implement the -profile interface for each binding or not. For example, it must -be noted that the \openshmem \Cstd[11] type-generic interfaces for +\item It must be ensured that the \openshmem functions that are not +replaced as above, may still be linked into an executable image +without causing name clashes. +\item Documentation of the implementation of different language +bindings of the \openshmem interface must indicate if they +are layered on top of each other. Using this documentation, +developers can determine whether they need to implement the +profile interface for each binding or not. For example, it must +be noted that the \openshmem \Cstd[11] type-generic interfaces for different \ac{RMA} and \ac{AMO} operations cannot have any equivalent -\FUNC{pshmem\_} interfaces because the \Cstd[11] type-generic +\FUNC{pshmem\_} interfaces because the \Cstd[11] type-generic interfaces are implemented as macros. -\item In the case where the implementation of different \ac{API} -feature sets is implemented through a layered approach using -``wrapper'' functions, the wrapper functions must be kept separate -from the rest of the library. This requirement allows the developers -to extract these functions from the original \openshmem library -and add them into the profiling library without bringing along any +\item In the case where the implementation of different \ac{API} +feature sets is implemented through a layered approach using +``wrapper'' functions, the wrapper functions must be kept separate +from the rest of the library. This requirement allows the developers +to extract these functions from the original \openshmem library +and add them into the profiling library without bringing along any other code. -\item A no-op routine, \FUNC{shmem\_pcontrol}, must be provided +\item A no-op routine, \FUNC{shmem\_pcontrol}, must be provided in the \openshmem library. -\item It must be ensured that any \openshmem types or constants that are +\item It must be ensured that any \openshmem types or constants that are needed by the \FUNC{pshmem\_} interfaces are defined in \HEADER{pshmem.h}. \end{enumerate} -Provided that an \openshmem implementation meets these requirements, -it is possible for the implementor of the profiling system -to intercept the \openshmem calls that are made by the user -program. The information required can be collected before and after -calling the underlying \openshmem implementation through the name -shifted entry points. +Provided that an \openshmem implementation meets these requirements, +it is possible for the implementer of the profiling system +to intercept the \openshmem calls that are made by the user +program. The information required can be collected before and after +calling the underlying \openshmem implementation through the name +shifted entry points. \subsection{Control of Profiling} \label{sec:pshmem_control_profile} -Any user code must be able to control the profiler dynamically -during runtime. Generally, this capability is used for the +Any user code must be able to control the profiler dynamically +during runtime. Generally, this capability is used for the purposes of \begin{itemize} -\item Enabling and disabling of profiling based on the current +\item Enabling and disabling of profiling based on the current state of the execution and calculation, \item Flushing of the trace buffers at noncritical execution regions, \item Adding user events to a trace file. \end{itemize} -These functionalities can be achieved through the usage of +These functionalities can be achieved through the usage of \FUNC{shmem\_pcontrol}. \subsubsection{\textbf{SHMEM\_PCONTROL}}\label{subsec:shmem_pcontrol} @@ -133,7 +133,7 @@ \subsection{Limitations} \subsubsection{Multiple Counting} \label{sec:pshmem_multiple_count} -Since some functions in \openshmem library may be implemented +Since some functions in the \openshmem library may be implemented using more basic \openshmem functions, it is possible for these basic profiling functions to be called from within an \openshmem function that was originally called from a profiling routine. For example, diff --git a/content/programming_model_overview.tex b/content/programming_model_overview.tex index a76c99de..6d53ece1 100644 --- a/content/programming_model_overview.tex +++ b/content/programming_model_overview.tex @@ -43,11 +43,11 @@ \item \textbf{Symmetric Data Object Management} \begin{enumerate} - \item \OPR{Allocation}: All executing \acp{PE} must participate in the + \item \OPR{Allocation}: All executing \acp{PE} must collectively participate in the allocation of a symmetric data object with identical arguments. - \item \OPR{Deallocation}: All executing \acp{PE} must participate in the + \item \OPR{Deallocation}: All executing \acp{PE} must collectively participate in the deallocation of the same symmetric data object with identical arguments. - \item \OPR{Reallocation}: All executing \acp{PE} must participate in the + \item \OPR{Reallocation}: All executing \acp{PE} must collectively participate in the reallocation of the same symmetric data object with identical arguments. \end{enumerate} @@ -81,9 +81,12 @@ \item \textbf{\acfp{AMO}} \begin{enumerate} - \item \OPR{Swap}: The \ac{PE} initiating the swap gets the old value of a - symmetric data object from a remote \ac{PE} and copies a new value to - that symmetric data object on the remote \ac{PE}. + \item \OPR{Fetch}: The \ac{PE} initiating the fetch returns the value of the + symmetric data object on the remote \ac{PE}. + \item \OPR{Set}: The \ac{PE} initiating the set copies a new value to the + symmetric data object on the remote \ac{PE}. + \item \OPR{Swap}: The \ac{PE} initiating the swap copies a new value to the + symmetric data object on the remote \ac{PE} and returns the old value. \item \OPR{Increment}: The \ac{PE} initiating the increment adds 1 to the symmetric data object on the remote \ac{PE}. \item \OPR{Add}: The \ac{PE} initiating the add specifies the value to be added @@ -91,14 +94,14 @@ \item \OPR{Bitwise Operations}: The \ac{PE} initiating the bitwise operation specifies the operand value to the bitwise operation to be performed on the symmetric data object on the remote \ac{PE}. - \item \OPR{Compare and Swap}: The \ac{PE} initiating the swap gets the old value - of the symmetric data object based on a value to be compared and copies a - new value to the symmetric data object on the remote \ac{PE}. - \item \OPR{Fetch and Increment}: The \ac{PE} initiating the increment adds 1 to - the symmetric data object on the remote \ac{PE} and returns with the old + \item \OPR{Compare and Swap}: The \ac{PE} initiating the compare and swap + conditionally copies a new value to the symmetric data object on the + remote \ac{PE} and returns the old value. + \item \OPR{Fetch and Increment}: The \ac{PE} initiating the increment adds 1 + to the symmetric data object on the remote \ac{PE} and returns the old value. \item \OPR{Fetch and Add}: The \ac{PE} initiating the add specifies the value to - be added to the symmetric data object on the remote \ac{PE} and returns with + be added to the symmetric data object on the remote \ac{PE} and returns the old value. \item \OPR{Fetch and Bitwise Operations}: The \ac{PE} initiating the bitwise operation specifies the operand value to the bitwise operation to be @@ -108,9 +111,23 @@ \item \textbf{Signaling Operations} \begin{enumerate} - \item \OPR{Signaling Put}: The \source{} data is copied to the symmetric - object on the remote \ac{PE} and a flag on the remote \ac{PE} is subsequently - updated to signal completion. + \item \OPR{Put Signal}: The local \ac{PE} specifies the \source{} data object + to be copied to the symmetric data object on the remote \ac{PE} and + another symmetric data object on the remote \ac{PE} is subsequently + updated to signal completion. + \item \OPR{Signal Add}: The local \ac{PE} specifies a value to be added to + the symmetric data object on the remote \ac{PE}. + \item \OPR{Signal Set}: The local \ac{PE} specifies a value to be copied to + the symmetric data object on the remote \ac{PE}. + \item \OPR{Signal Fetch}: The local \ac{PE} returns the value of a local data + object. +\end{enumerate} + +\item \textbf{Session Management} +\begin{enumerate} + \item \OPR{Sessions}: Sessions are a mechanism for the application to inform + the implementation about an upcoming sequence of operations that exhibit + a pattern that may be suitable for runtime optimization. \end{enumerate} \item \textbf{Synchronization and Ordering} @@ -135,7 +152,7 @@ \begin{enumerate} \item \OPR{Broadcast}: The \VAR{root} \ac{PE} specifies a symmetric data object to be copied to a symmetric data object on one or more remote - \acp{PE} (not including itself). + \acp{PE}. \item \OPR{Collection}: All \acp{PE} participating in the routine get the result of concatenated symmetric objects contributed by each of the \acp{PE} in another symmetric data object. @@ -143,8 +160,11 @@ of an associative binary routine over elements of the specified symmetric data object on another symmetric data object. \item \OPR{All-to-All}: All \acp{PE} participating in the routine exchange - a fixed amount of contiguous or strided data with all other \acp{PE} - in the active set. + a fixed amount of contiguous or strided data with all other participating + \acp{PE}. + \item \OPR{Scan}: All \acp{PE} participating in the routine perform an + inclusive or exclusive prefix sum over elements of the specified + symmetric data object. \end{enumerate} \item \textbf{Mutual Exclusion} diff --git a/content/rma_intro.tex b/content/rma_intro.tex index f986d6c8..b7bd5072 100644 --- a/content/rma_intro.tex +++ b/content/rma_intro.tex @@ -18,7 +18,7 @@ The destination \ac{PE} is specified as an integer representing the \ac{PE} number. This \ac{PE} number is relative to the team associated with the -communication context being using for the operation. If no context argument is passed to the routine, +communication context being used for the operation. If no context argument is passed to the routine, then the routine operates on the default context, which implies that the \ac{PE} number is relative to the world team. If the \ac{PE} number passed to the routine is invalid, being negative @@ -32,10 +32,10 @@ is not explicitly passed and thus, the operations are performed on the default context. -Where appropriate compiler support is available, \openshmem provides type-generic +Where appropriate compiler support is available, \openshmem provides type-generic one-sided communication interfaces via \Cstd[11] generic selection (\Cstd[11]~\S6.5.1.1\footnote{Formally, the \Cstd[11] specification is ISO/IEC 9899:2011(E).}) -for block, scalar, and block-strided put and get communication. +for block, scalar, and block-strided put and get communication. Such type-generic routines are supported for the ``standard \ac{RMA} types'' listed in Table \ref{stdrmatypes}. @@ -44,7 +44,7 @@ \footnote{Formally, the \Cstd[99] specification is ISO/IEC~9899:1999(E).}% ~\S7.18.1.1 and \Cstd[11]~\S7.20.1.1. When the \Cstd translation environment does not provide exact-width integer types with \HEADER{stdint.h}, an -\openshmem implemementation is not required to provide support for these types. +\openshmem implementation is not required to provide support for these types. \begin{table}[h] \begin{center} @@ -78,5 +78,5 @@ \end{tabular} \TableCaptionRef{Standard \ac{RMA} Types and Names} \label{stdrmatypes} - \end{center} + \end{center} \end{table} diff --git a/content/shmem_addr_accessible.tex b/content/shmem_addr_accessible.tex index 14b83766..d32e54be 100644 --- a/content/shmem_addr_accessible.tex +++ b/content/shmem_addr_accessible.tex @@ -18,7 +18,7 @@ \FUNC{shmem\_addr\_accessible} is a query routine that indicates whether the address \VAR{addr} can be used to access the given data object on the specified \ac{PE} via \openshmem routines. - + This routine verifies that the data object is symmetric and accessible with respect to a remote \ac{PE} via \openshmem data transfer routines. The specified address \VAR{addr} is the local address of the data object on the diff --git a/content/shmem_alltoall.tex b/content/shmem_alltoall.tex index 188e2875..f271de11 100644 --- a/content/shmem_alltoall.tex +++ b/content/shmem_alltoall.tex @@ -35,17 +35,17 @@ \apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive the combined total of \VAR{nelems} elements from each \ac{PE} in the - active set. + participating \acp{PE}. The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems} - elements of data for each \ac{PE} in the active set, ordered according to + elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to destination \ac{PE}. The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{nelems}{ - The number of elements to exchange for each \ac{PE}. - For \FUNC{shmem\_alltoallmem}, elements are bytes; - for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes, - respectively. + The number of elements to exchange for each \ac{PE}. + For \FUNC{shmem\_alltoallmem}, elements are bytes; + for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes, + respectively. } \begin{DeprecateBlock} @@ -89,9 +89,7 @@ Given a \ac{PE} \VAR{i} that is the \kth \ac{PE} participating in the operation and a \ac{PE} \VAR{j} that is the \lth \ac{PE} - participating in the operation, - - \ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to + participating in the operation, \ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to the \kth block of the \VAR{dest} object of \ac{PE} \VAR{j}. @@ -100,6 +98,25 @@ If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine, the following + conditions must be ensured, otherwise the behavior is undefined: + \begin{itemize} + \item The \dest{} array on all \acp{PE} in the team is ready to + accept the result of the operation. + \item The \source{} array at the local \ac{PE} is ready to be + read by any \ac{PE} in the team. + \end{itemize} + The application does not need to synchronize to ensure that the \source{} + array is ready across all \acp{PE} prior to calling this routine. + + Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for + the local PE: + \begin{itemize} + \item Its \VAR{dest} symmetric data object is completely updated and the + data has been copied out of the source data object. + \end{itemize} + +\begin{DeprecateBlock} Active-set-based collective routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. @@ -116,23 +133,26 @@ Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine, the following conditions must be ensured: + \begin{itemize} - \item The \VAR{dest} data object on all \acp{PE} in the active set is - ready to accept the \FUNC{shmem\_alltoall} data. - \item For active-set-based routines, the \VAR{pSync} array - on all \acp{PE} in the active set is not still in use from a prior call - to a \FUNC{shmem\_alltoall} routine. + \item The \VAR{dest} data object on all \acp{PE} in the active set is + ready to accept the \FUNC{shmem\_alltoall} data. + \item For active-set-based routines, the \VAR{pSync} array + on all \acp{PE} in the active set is not still in use from a prior call + to a \FUNC{shmem\_alltoall} routine. \end{itemize} + Otherwise, the behavior is undefined. Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for the local PE: \begin{itemize} - \item Its \VAR{dest} symmetric data object is completely updated and - the data has been copied out of the \VAR{source} data object. - \item For active-set-based routines, - the values in the \VAR{pSync} array are restored to the original values. + \item Its \VAR{dest} symmetric data object is completely updated and the + data has been copied out of the source data object. + \item For active-set-based routines, + the values in the \VAR{pSync} array are restored to the original values. \end{itemize} +\end{DeprecateBlock} } \apireturnvalues{ diff --git a/content/shmem_alltoalls.tex b/content/shmem_alltoalls.tex index e371b8cf..0ac9d8d3 100644 --- a/content/shmem_alltoalls.tex +++ b/content/shmem_alltoalls.tex @@ -33,12 +33,12 @@ \apiargument{IN}{team}{A valid \openshmem team handle.}% -\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive +\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive the combined total of \VAR{nelems} elements from each \ac{PE} in the - active set. + participating \acp{PE}. The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems} - elements of data for each \ac{PE} in the active set, ordered according to + elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to destination \ac{PE}. The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{dst}{The stride between consecutive elements of the \dest{} @@ -53,7 +53,7 @@ for \FUNC{shmem\_alltoalls\{32,64\}}, elements are 4 or 8 bytes, respectively. } - + \begin{DeprecateBlock} \apiargument{IN}{PE\_start}{The lowest \ac{PE} number of the active set of \acp{PE}.} @@ -82,7 +82,7 @@ The same \dest{} and \source{} arrays and same values for values of arguments \VAR{dst}, \VAR{sst}, \VAR{nelems} must be passed by all \acp{PE} that participate in the collective. - + Given a \ac{PE} \VAR{i} that is the \kth \ac{PE} participating in the operation and a \ac{PE} \VAR{j} that is the \lth \ac{PE} @@ -99,8 +99,7 @@ \item The pre- and post-conditions for symmetric objects. \item Typing constraints for \dest{} and \source{} data objects. \end{itemize} - -} +} \apireturnvalues{ diff --git a/content/shmem_atomic_add.tex b/content/shmem_atomic_add.tex index 12737496..8a3b4f3d 100644 --- a/content/shmem_atomic_add.tex +++ b/content/shmem_atomic_add.tex @@ -39,8 +39,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the atomic add operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number upon which - \dest{} is to be updated.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_and.tex b/content/shmem_atomic_and.tex index f82ee4bf..5803812b 100644 --- a/content/shmem_atomic_and.tex +++ b/content/shmem_atomic_and.tex @@ -28,9 +28,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise AND operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_compare_swap.tex b/content/shmem_atomic_compare_swap.tex index 85b371fa..3763909d 100644 --- a/content/shmem_atomic_compare_swap.tex +++ b/content/shmem_atomic_compare_swap.tex @@ -45,8 +45,8 @@ type as \VAR{dest}.} \apiargument{IN}{value}{The value to be atomically written to the remote \ac{PE}. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number upon which - \VAR{dest} is to be updated.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_compare_swap_nbi.tex b/content/shmem_atomic_compare_swap_nbi.tex index 84a90a12..a3dabaa5 100644 --- a/content/shmem_atomic_compare_swap_nbi.tex +++ b/content/shmem_atomic_compare_swap_nbi.tex @@ -35,8 +35,8 @@ \apiargument{IN}{value}{The value to be atomically written to the remote \ac{PE}. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number upon which - \VAR{dest} is to be updated.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch.tex b/content/shmem_atomic_fetch.tex index 3c11f2a1..3e7fab8a 100644 --- a/content/shmem_atomic_fetch.tex +++ b/content/shmem_atomic_fetch.tex @@ -40,9 +40,9 @@ the default context.} \apiargument{IN}{source}{Symmetric address of the source data object. The type of \source{} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number from which - \VAR{source} is to be fetched.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} on which \VAR{source} resides + relative to the team associated with the given \VAR{ctx} when provided, or the + default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_add.tex b/content/shmem_atomic_fetch_add.tex index a07977ec..a0e73984 100644 --- a/content/shmem_atomic_fetch_add.tex +++ b/content/shmem_atomic_fetch_add.tex @@ -41,9 +41,8 @@ SYNOPSIS section.} \apiargument{IN}{value}{The operand to the atomic fetch-and-add operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} -\apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \VAR{dest} is to be updated.} - +\apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_add_nbi.tex b/content/shmem_atomic_fetch_add_nbi.tex index 3b9e4021..7f1007a2 100644 --- a/content/shmem_atomic_fetch_add_nbi.tex +++ b/content/shmem_atomic_fetch_add_nbi.tex @@ -30,9 +30,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the atomic fetch-and-add operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \VAR{dest} is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_and.tex b/content/shmem_atomic_fetch_and.tex index 0a4d7843..675449f2 100644 --- a/content/shmem_atomic_fetch_and.tex +++ b/content/shmem_atomic_fetch_and.tex @@ -27,9 +27,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise AND operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_and_nbi.tex b/content/shmem_atomic_fetch_and_nbi.tex index 9959a1cc..97b8b0c5 100644 --- a/content/shmem_atomic_fetch_and_nbi.tex +++ b/content/shmem_atomic_fetch_and_nbi.tex @@ -30,9 +30,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise AND operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_inc.tex b/content/shmem_atomic_fetch_inc.tex index 44710246..96437ac7 100644 --- a/content/shmem_atomic_fetch_inc.tex +++ b/content/shmem_atomic_fetch_inc.tex @@ -38,9 +38,8 @@ the default context.} \apiargument{OUT}{dest}{Symmetric address of the destination data object. The type of \dest{} should match that implied in the SYNOPSIS section.} -\apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \dest{} is to be updated.} - +\apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} diff --git a/content/shmem_atomic_fetch_inc_nbi.tex b/content/shmem_atomic_fetch_inc_nbi.tex index 6cfbfb3a..a7c17c3b 100644 --- a/content/shmem_atomic_fetch_inc_nbi.tex +++ b/content/shmem_atomic_fetch_inc_nbi.tex @@ -28,9 +28,8 @@ The type of \VAR{fetch} should match that implied in the SYNOPSIS section.} \apiargument{OUT}{dest}{Symmetric address of the destination data object. The type of \dest{} should match that implied in the SYNOPSIS section.} -\apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \dest{} is to be updated.} - +\apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} diff --git a/content/shmem_atomic_fetch_nbi.tex b/content/shmem_atomic_fetch_nbi.tex index 4891dc97..43fdd53b 100644 --- a/content/shmem_atomic_fetch_nbi.tex +++ b/content/shmem_atomic_fetch_nbi.tex @@ -28,9 +28,8 @@ The type of \VAR{fetch} should match that implied in the SYNOPSIS section.} \apiargument{OUT}{source}{Symmetric address of the source data object. The type of \source{} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number from which - \VAR{source} is to be fetched.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_or.tex b/content/shmem_atomic_fetch_or.tex index 81046d8a..0eb922bd 100644 --- a/content/shmem_atomic_fetch_or.tex +++ b/content/shmem_atomic_fetch_or.tex @@ -27,9 +27,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise OR operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_or_nbi.tex b/content/shmem_atomic_fetch_or_nbi.tex index 7a6c8668..d62fcd3a 100644 --- a/content/shmem_atomic_fetch_or_nbi.tex +++ b/content/shmem_atomic_fetch_or_nbi.tex @@ -30,9 +30,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise OR operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_xor.tex b/content/shmem_atomic_fetch_xor.tex index a390500e..fd563cb1 100644 --- a/content/shmem_atomic_fetch_xor.tex +++ b/content/shmem_atomic_fetch_xor.tex @@ -28,9 +28,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise XOR operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_fetch_xor_nbi.tex b/content/shmem_atomic_fetch_xor_nbi.tex index 2b2cd085..f69739c3 100644 --- a/content/shmem_atomic_fetch_xor_nbi.tex +++ b/content/shmem_atomic_fetch_xor_nbi.tex @@ -30,9 +30,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise XOR operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_inc.tex b/content/shmem_atomic_inc.tex index fde7e9dc..c7cc359f 100644 --- a/content/shmem_atomic_inc.tex +++ b/content/shmem_atomic_inc.tex @@ -38,9 +38,8 @@ the default context.} \apiargument{OUT}{dest}{Symmetric address of the destination data object. The type of \dest{} should match that implied in the SYNOPSIS section.} -\apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \dest{} is to be updated.} - +\apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_or.tex b/content/shmem_atomic_or.tex index 033de757..5f00c530 100644 --- a/content/shmem_atomic_or.tex +++ b/content/shmem_atomic_or.tex @@ -28,9 +28,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise OR operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_set.tex b/content/shmem_atomic_set.tex index fcea7dbc..070e760f 100644 --- a/content/shmem_atomic_set.tex +++ b/content/shmem_atomic_set.tex @@ -42,9 +42,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the atomic set operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} -\apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \VAR{dest} is to be updated.} - +\apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_swap.tex b/content/shmem_atomic_swap.tex index adab7bdb..3a0a6577 100644 --- a/content/shmem_atomic_swap.tex +++ b/content/shmem_atomic_swap.tex @@ -39,8 +39,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The value to be atomically written to the remote \ac{PE}. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{ An integer that indicates the \ac{PE} number on which - \dest{} is to be updated.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_swap_nbi.tex b/content/shmem_atomic_swap_nbi.tex index bfac883c..fb619f20 100644 --- a/content/shmem_atomic_swap_nbi.tex +++ b/content/shmem_atomic_swap_nbi.tex @@ -30,8 +30,8 @@ \apiargument{IN}{value}{The value to be atomically written to the remote \ac{PE}. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer that indicates the \ac{PE} number on which - \dest{} is to be updated.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_atomic_xor.tex b/content/shmem_atomic_xor.tex index d4f863ad..65a7b6dc 100644 --- a/content/shmem_atomic_xor.tex +++ b/content/shmem_atomic_xor.tex @@ -28,9 +28,8 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The operand to the bitwise XOR operation. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{An integer value for the \ac{PE} on which \VAR{dest} - is to be updated.} - + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_barrier_all.tex b/content/shmem_barrier_all.tex index 4f2675bb..bb81c6fb 100644 --- a/content/shmem_barrier_all.tex +++ b/content/shmem_barrier_all.tex @@ -16,7 +16,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_barrier\_all} routine is a mechanism for synchronizing all \acp{PE} in the world team at once. This routine blocks the calling \ac{PE} until all \acp{PE} have called diff --git a/content/shmem_broadcast.tex b/content/shmem_broadcast.tex index a172a12e..ec3d4aa4 100644 --- a/content/shmem_broadcast.tex +++ b/content/shmem_broadcast.tex @@ -45,7 +45,7 @@ respectively. } \apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to - the team or active set, from which the data is copied.} + the calling PEs, from which the data is copied.} \begin{DeprecateBlock} @@ -60,9 +60,8 @@ \end{apiarguments} -\apidescription{ - \openshmem broadcast routines are collective routines over an active set or - valid \openshmem team. +\apidescription{ + \openshmem team-based broadcast routines are collective routines over a valid \openshmem team. They copy the \source{} data object on the \ac{PE} specified by \VAR{PE\_root} to the \dest{} data object on the \acp{PE} participating in the collective operation. @@ -75,6 +74,9 @@ \item The \dest{} object is updated on all \acp{PE}. \item All \acp{PE} in the \VAR{team} argument must participate in the operation. + \item Only \acp{PE} in the team may call the routine. If a + \ac{PE} not in the team calls a team-based + collective routine, the behavior is undefined. \item If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. \item \ac{PE} numbering is relative to the team. The specified @@ -82,59 +84,82 @@ between \CONST{0} and \VAR{N$-$1}, where \VAR{N} is the size of the team. \end{itemize} - + + Before any \ac{PE} calls a broadcast routine, the following conditions + must be ensured, otherwise the behavior is undefined: + \begin{itemize} + \item The \dest{} array on all \acp{PE} in the team is ready to + accept the result of the operation. + \item The \source{} array at the local root \ac{PE} is ready to be + read by any \ac{PE} in the team. + \end{itemize} + The application does not need to synchronize to ensure that the \source{} + array is ready across all \acp{PE} prior to calling this routine. + + Upon return from a team-based broadcast routine, the following are true for the local + \ac{PE}: + \begin{itemize} + \item The \dest{} data object is updated. + \item The \source{} data object may be safely reused. + \end{itemize} + +\begin{DeprecateBlock} + \openshmem active-set broadcast routines are collective routines over an active set. + They copy the \source{} data object on the \ac{PE} specified by + \VAR{PE\_root} to the \dest{} data object on the \acp{PE} + participating in the collective operation. + The same \dest{} and \source{} data objects and the same value of + \VAR{PE\_root} must be passed by all \acp{PE} participating in the + collective operation. + For active-set-based broadcasts: \begin{itemize} - \item The \dest{} object is updated on all \acp{PE} other than the - root \ac{PE}. - \item All \acp{PE} in the active set defined by the - \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet - must participate in the operation. - \item Only \acp{PE} in the active set may call the routine. If a - \ac{PE} not in the active set calls an active-set-based + \item The \VAR{dest} object is updated on all PEs other than the root PE. + \item All \acp{PE} in the active set defined by the + \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet + must participate in the operation. + \item Only \acp{PE} in the active set may call the routine. If a + \ac{PE} not in the active set calls an active-set-based collective routine, the behavior is undefined. - \item The values of arguments \VAR{PE\_root}, \VAR{PE\_start}, + \item The values of arguments \VAR{PE\_root}, \VAR{PE\_start}, \VAR{logPE\_stride}, and \VAR{PE\_size} must be the same value on all \acp{PE} in the active set. - \item The value of \VAR{PE\_root} must be between \CONST{0} and + \item The value of \VAR{PE\_root} must be between \CONST{0} and \VAR{PE\_size $-$ 1}. - \item The same \VAR{pSync} work array must be passed by all \acp{PE} + \item The same \VAR{pSync} work array must be passed by all \acp{PE} in the active set. \end{itemize} - Before any \ac{PE} calls a broadcast routine, the following + Before any \ac{PE} calls a active-set-based broadcast routine, the following conditions must be ensured: \begin{itemize} - \item The \dest{} array on all \acp{PE} participating in the broadcast - is ready to accept the broadcast data. - \item For active-set-based broadcasts, the - \VAR{pSync} array on all \acp{PE} in the - active set is not still in use from a prior call to an \openshmem - collective routine. + \item The \dest{} array on all \acp{PE} participating in the broadcast + is ready to accept the broadcast data. + \item The \VAR{pSync} array on all \acp{PE} in the + active set is not still in use from a prior call to an \openshmem + collective routine. \end{itemize} Otherwise, the behavior is undefined. - Upon return from a broadcast routine, the following are true for the local + Upon return from an active-based broadcast routine, the following are true for the local \ac{PE}: \begin{itemize} - \item For team-based broadcasts, the \dest{} data object is - updated. - \item For active-set-based broadcasts: - \begin{itemize} - \item If the current \ac{PE} is not the root \ac{PE}, the - \dest{} data object is updated. + \item If the current PE is not the root PE, the \dest{} data object is updated. + \item The \source{} data object may be safely reused. \item The values in the \VAR{pSync} array are restored to the original values. - \end{itemize} - \item The \source{} data object may be safely reused. \end{itemize} +\end{DeprecateBlock} } \apireturnvalues{ For team-based broadcasts, zero on successful local completion; otherwise, nonzero. +\begin{DeprecateBlock} For active-set-based broadcasts, none. +\end{DeprecateBlock} + } \apinotes{ diff --git a/content/shmem_calloc.tex b/content/shmem_calloc.tex index fc19de40..f0241bed 100644 --- a/content/shmem_calloc.tex +++ b/content/shmem_calloc.tex @@ -1,5 +1,5 @@ \apisummary{ - Allocate a zeroed block of symmetric memory. + Collectively allocate a zeroed block of symmetric memory. } \begin{apidefinition} diff --git a/content/shmem_collect.tex b/content/shmem_collect.tex index 5430abcf..921a7dd9 100644 --- a/content/shmem_collect.tex +++ b/content/shmem_collect.tex @@ -66,21 +66,19 @@ \openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective operation to concatenate \VAR{nelems} data items from the \source{} array into the - \dest{} array, over an \openshmem team or active set - in processor number order. The resultant \dest{} array contains the contribution from + \dest{} array, over an \openshmem team in processor number order. + The resultant \dest{} array contains the contribution from \acp{PE} as follows: - + \begin{itemize} - \item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the - contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on. - \item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the - contribution from \ac{PE} \CONST{1} in the team, and so on. + \item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the + contribution from \ac{PE} \CONST{1} in the team, and so on. \end{itemize} - + The collected result is written to the \dest{} array for all \acp{PE} that participate in the operation. The same \dest{} and \source{} arrays must be passed by all \acp{PE} that participate in the operation. - + The \FUNC{fcollect} routines require that \VAR{nelems} be the same value in all participating \acp{PE}, while the \FUNC{collect} routines allow \VAR{nelems} to vary from \ac{PE} to \ac{PE}. @@ -90,24 +88,56 @@ If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + Before any \ac{PE} calls a collect routine, the following conditions must + be ensured, otherwise the behavior is undefined: + \begin{itemize} + \item The \dest{} array on all \acp{PE} in the team is ready to + accept the result of the operation. + \item The \source{} array at the local \ac{PE} is ready to be read + by any \ac{PE} in the team. + \end{itemize} + The application does not need to synchronize to ensure that the \source{} + array is ready across all \acp{PE} prior to calling this routine. + +\begin{DeprecateBlock} + \openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective + operation to concatenate \VAR{nelems} + data items from the \source{} array into the + \dest{} array, over an \openshmem active set + in processor number order. The resultant \dest{} array contains the contribution from + \acp{PE} as follows: + \begin{itemize} + \item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the + contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on. + \end{itemize} + + The collected result is written to the \dest{} array for all \acp{PE} + that participate in the operation. The same \dest{} and \source{} + arrays must be passed by all \acp{PE} that participate in the operation. + + The \FUNC{fcollect} routines require that \VAR{nelems} be the same value in all + participating \acp{PE}, while the \FUNC{collect} routines allow \VAR{nelems} to + vary from \ac{PE} to \ac{PE}. + Active-set-based collective routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. As with all active-set-based collective routines, each of these routines assumes that only \acp{PE} in the active set call the routine. If a \ac{PE} not in the active set and calls this collective routine, the behavior is undefined. - + The values of arguments \VAR{PE\_start}, \VAR{logPE\_stride}, and \VAR{PE\_size} must be the same value on all \acp{PE} in the active set. The same \VAR{pSync} work array must be passed by all \acp{PE} in the active set. - + Upon return from a collective routine, the following are true for the local \ac{PE}: \begin{itemize} - \item The \dest{} array is updated and the \source{} array may be safely reused. + \item The \dest{} array is updated and the \source{} array may be safely reused. \item For active-set-based collective routines, the values in the \VAR{pSync} array are restored to the original values. \end{itemize} +\end{DeprecateBlock} } \apireturnvalues{ @@ -115,9 +145,15 @@ } \apinotes{ +\begin{DeprecateBlock} The collective routines operate on active \ac{PE} sets that have a non-power-of-two \VAR{PE\_size} with some performance degradation. They operate with no performance degradation when \VAR{nelems} is a non-power-of-two value. +\end{DeprecateBlock} + The collective routines that operate on teams containing a + non-power-of-two of PEs do so with some performance degradation. They operate + with no performance degradation when \VAR{nelems} is a non-power-of-two value. + } \begin{apiexamples} diff --git a/content/shmem_finalize.tex b/content/shmem_finalize.tex index 5496e9bf..b9c0e48a 100644 --- a/content/shmem_finalize.tex +++ b/content/shmem_finalize.tex @@ -23,7 +23,7 @@ An \openshmem program may perform a series of matching initialization and finalization calls. The last call to \FUNC{shmem\_finalize} in this series - releases all resources used by the \openshmem library. + releases all resources used by the \openshmem library. This call destroys all teams created by the \openshmem program. As a result, all shareable contexts are destroyed. The user is @@ -44,7 +44,7 @@ All processes that represent the \acp{PE} will still exist after the call to \FUNC{shmem\_finalize} returns, but they will no longer have access - to resources that have been released. + to \openshmem library resources that have been released. } \apireturnvalues{ diff --git a/content/shmem_g.tex b/content/shmem_g.tex index 1fa6a2c9..8f1b91fc 100644 --- a/content/shmem_g.tex +++ b/content/shmem_g.tex @@ -23,11 +23,14 @@ \apiargument{IN}{source}{Symmetric address of the source data object. The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{pe}{The number of the remote \ac{PE} on which \VAR{source} resides.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} on which \VAR{source} resides + relative to the team associated with the given \VAR{ctx} when provided, or the + default context otherwise.} \end{apiarguments} \apidescription{ These routines provide a very low latency get capability for single elements - of most basic types. + of most basic types. } \apireturnvalues{ diff --git a/content/shmem_get.tex b/content/shmem_get.tex index 890f83f2..b34f84cf 100644 --- a/content/shmem_get.tex +++ b/content/shmem_get.tex @@ -37,12 +37,13 @@ The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{nelems}{Number of elements in the \dest{} and \source{} arrays. For \FUNC{shmem\_getmem} and \FUNC{shmem\_ctx\_getmem}, elements are bytes.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ The get routines provide a method for copying a contiguous symmetric data - object from a different \ac{PE} to a contiguous data object on the local + object from a remote \ac{PE} to a contiguous data object on the local \ac{PE}. The routines return after the data has been delivered to the \dest{} array on the local \ac{PE}. } diff --git a/content/shmem_get_nbi.tex b/content/shmem_get_nbi.tex index 57adb768..61bbb698 100644 --- a/content/shmem_get_nbi.tex +++ b/content/shmem_get_nbi.tex @@ -39,12 +39,13 @@ \apiargument{IN}{nelems}{Number of elements in the \dest{} and \source{} arrays. For \FUNC{shmem\_getmem\_nbi} and \FUNC{shmem\_ctx\_getmem\_nbi}, elements are bytes.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ The get routines provide a method for copying a contiguous symmetric data - object from a different \ac{PE} to a contiguous data object on the local + object from a remote \ac{PE} to a contiguous data object on the local \ac{PE}. The routines return after initiating the operation. The operation is considered complete after a subsequent call to \FUNC{shmem\_quiet}. At the completion of \FUNC{shmem\_quiet}, the diff --git a/content/shmem_global_exit.tex b/content/shmem_global_exit.tex index f3e49092..ced34b0c 100644 --- a/content/shmem_global_exit.tex +++ b/content/shmem_global_exit.tex @@ -48,7 +48,7 @@ terminate regardless of their current execution state. While I/O must be flushed for standard language I/O calls from \CorCpp, it is implementation dependent as to how I/O done by other means (e.g., third - party I/O libraries) is handled. Similarly, resources are released + party I/O libraries) are handled. Similarly, resources are released according to \CorCpp standard language requirements, but this may not include all resources allocated for the \openshmem program. However, a quality implementation will make a best effort to flush all I/O and clean diff --git a/content/shmem_ibget.tex b/content/shmem_ibget.tex index 4deacf0f..811f1e2d 100644 --- a/content/shmem_ibget.tex +++ b/content/shmem_ibget.tex @@ -42,7 +42,8 @@ arrays.} \apiargument{IN}{nblocks}{Number of blocks to be copied from the \source{} array to the \dest{} array.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_ibput.tex b/content/shmem_ibput.tex index 2c949854..55b2987d 100644 --- a/content/shmem_ibput.tex +++ b/content/shmem_ibput.tex @@ -42,13 +42,14 @@ arrays.} \apiargument{IN}{nblocks}{Number of blocks to be copied from the \source{} array to the \dest{} array.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ The \FUNC{shmem\_ibput} routines provide a method for copying strided data - blocks (specified by \VAR{sst}) of an array from a \source{} array on the + blocks (of size \VAR{bsize}) with stride (specified by \VAR{sst}) of an array from a \source{} array on the local \ac{PE} to locations specified by stride \VAR{dst} on a \dest{} array on specified remote \ac{PE}. The routines return when the data has been copied out of the \VAR{source} array on the local \ac{PE} but not diff --git a/content/shmem_iget.tex b/content/shmem_iget.tex index 85871571..369d057a 100644 --- a/content/shmem_iget.tex +++ b/content/shmem_iget.tex @@ -40,7 +40,8 @@ indicates contiguous data.} \apiargument{IN}{nelems}{Number of elements in the \dest{} and \source{} arrays.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_init.tex b/content/shmem_init.tex index 6bfe2e1b..4929ecaf 100644 --- a/content/shmem_init.tex +++ b/content/shmem_init.tex @@ -16,9 +16,10 @@ \apidescription{ \FUNC{shmem\_init} allocates and initializes resources used by the \openshmem library. It is a collective operation that all \acp{PE} must call before any - other \openshmem routine may be called. At the end of the \openshmem program - which it initialized, the call to \FUNC{shmem\_init} must be matched with a - call to \FUNC{shmem\_finalize}. + other \openshmem routine may be called, except \FUNC{shmem\_query\_initialized} + which checks the current initialized state of the library. In the + \openshmem program which it initialized, each call to \FUNC{shmem\_init} must + be matched with a corresponding call to \FUNC{shmem\_finalize}. The \FUNC{shmem\_init} and \FUNC{shmem\_init\_thread} initialization routines may be called multiple times within an \openshmem program. A @@ -33,16 +34,18 @@ None. } +\begin{DeprecateBlock} \apinotes{ As of \openshmem[1.2], the use of \FUNC{start\_pes} has been deprecated and calls to it should be replaced with calls to \FUNC{shmem\_init}. While support for \FUNC{start\_pes} is still required in \openshmem libraries, users are encouraged to use \FUNC{shmem\_init}. An important difference between - \FUNC{shmem\_init} and \FUNC{start\_pes} is that multiple calls to - \FUNC{shmem\_init} within a program results in undefined behavior, while in the - case of \FUNC{start\_pes}, any subsequent calls to \FUNC{start\_pes} after the + \FUNC{shmem\_init} and \FUNC{start\_pes} is that every call to + \FUNC{shmem\_init} within a program must be matched with a call to \FUNC{shmem\_finalize}. + In the case of \FUNC{start\_pes}, any subsequent calls to \FUNC{start\_pes} after the first one results in a no-op. } +\end{DeprecateBlock} \begin{apiexamples} diff --git a/content/shmem_init_thread.tex b/content/shmem_init_thread.tex index f1f397d8..a5d81ae9 100644 --- a/content/shmem_init_thread.tex +++ b/content/shmem_init_thread.tex @@ -15,12 +15,12 @@ \end{apiarguments} \apidescription{ -\FUNC{shmem\_init\_thread} initializes the \openshmem library in the same way as -\FUNC{shmem\_init}. In addition, \FUNC{shmem\_init\_thread} also performs -the initialization required for supporting the provided thread level. -The argument \VAR{requested} is used to specify the desired level of -thread support. The argument \VAR{provided} returns the support level -provided by the library. The allowed values for \VAR{provided} and +\FUNC{shmem\_init\_thread} initializes the \openshmem library in the same way as +\FUNC{shmem\_init}. In addition, \FUNC{shmem\_init\_thread} also performs +the initialization required for supporting the provided thread level. +The argument \VAR{requested} is used to specify the desired level of +thread support. The argument \VAR{provided} returns the support level +provided by the library. The allowed values for \VAR{provided} and \VAR{requested} are \CONST{SHMEM\_THREAD\_SINGLE}, \CONST{SHMEM\_THREAD\_FUNNELED}, \CONST{SHMEM\_THREAD\_SERIALIZED}, and \CONST{SHMEM\_THREAD\_MULTIPLE}. @@ -32,8 +32,8 @@ re-initialized with a subsequent call to an initialization routine. If the call to \FUNC{shmem\_init\_thread} -is unsuccessful in allocating and initializing resources for the -\openshmem library, then the behavior of any subsequent call +is unsuccessful in allocating and initializing resources for the +\openshmem library, then the behavior of any subsequent call to the \openshmem library is undefined. @@ -45,9 +45,9 @@ } \apinotes{ -The \openshmem library can be initialized either by \FUNC{shmem\_init} -or \FUNC{shmem\_init\_thread}. If the \openshmem library is initialized -by \FUNC{shmem\_init}, the library implementation can choose to +The \openshmem library can be initialized either by \FUNC{shmem\_init} +or \FUNC{shmem\_init\_thread}. If the \openshmem library is initialized +by \FUNC{shmem\_init}, the library implementation can choose to support any one of the defined thread levels. The \openshmem library may not be able to change the level of threading support diff --git a/content/shmem_iput.tex b/content/shmem_iput.tex index 16006b00..a937d54e 100644 --- a/content/shmem_iput.tex +++ b/content/shmem_iput.tex @@ -39,7 +39,8 @@ scaled by the element size of the \source{} array. A value of \CONST{1} indicates contiguous data.} \apiargument{IN}{nelems}{Number of elements in the \dest{} and \source{} arrays.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} diff --git a/content/shmem_malloc.tex b/content/shmem_malloc.tex index 6b0b176f..142f902c 100644 --- a/content/shmem_malloc.tex +++ b/content/shmem_malloc.tex @@ -23,8 +23,10 @@ \FUNC{malloc}, which allocates from the private heap). When \VAR{size} is zero, the \FUNC{shmem\_malloc} routine performs no action and returns a null pointer; otherwise, - \FUNC{shmem\_malloc} calls a barrier on exit. - + \FUNC{shmem\_malloc} calls a procedure that is semantically equivalent + to \FUNC{shmem\_barrier\_all} on exit. This ensures that all \acp{PE} participate + in the memory allocation, and that the memory on other \acp{PE} can be used as soon as the local + \ac{PE} returns. The value of the \VAR{size} argument must be identical on all \acp{PE}; otherwise, the behavior is undefined. } diff --git a/content/shmem_malloc_hints.tex b/content/shmem_malloc_hints.tex index ef4cbfc2..3b10e7bc 100644 --- a/content/shmem_malloc_hints.tex +++ b/content/shmem_malloc_hints.tex @@ -22,29 +22,26 @@ is a collective operation on the world team that returns a pointer to a block of at least \VAR{size} bytes, which shall be suitably aligned so that it may be assigned to a pointer to any type of object. This space is allocated from - the symmetric heap (similar to \FUNC{shmem\_malloc}). When the \VAR{size} is zero, - the \FUNC{shmem\_malloc\_with\_hints} routine performs no action and returns a null pointer. - - In addition to the \VAR{size} argument, the \VAR{hints} argument is provided by the user. + the symmetric heap (similar to \FUNC{shmem\_malloc}). When the \VAR{size} is zero, + the \FUNC{shmem\_malloc\_with\_hints} routine performs no action and returns a null pointer. + + In addition to the \VAR{size} argument, the \VAR{hints} argument is provided by the user. The \VAR{hints} describes the expected manner in which the \openshmem program may use the allocated memory. - The valid usage hints are described in Table~\ref{usagehints}. Multiple hints may be requested by combining them with a bitwise \CONST{OR} operation. + The valid usage of hints are described in Table~\ref{usagehints}. Multiple hints may be requested by combining them with a bitwise \CONST{OR} operation. A zero option can be given if no options are requested. - - The information provided by the \VAR{hints} is used to optimize for performance by the implementation. + + The information provided by the \VAR{hints} is used to optimize for performance by the implementation. If the implementation cannot optimize, the behavior is same as \FUNC{shmem\_malloc}. - If more than one hint is provided, the implementation will make the best effort to use one or more hints - to optimize performance. - + If more than one hint is provided, the implementation will make the best effort to use one or more hints + to optimize performance. + The \FUNC{shmem\_malloc\_with\_hints} routine is provided so that multiple \acp{PE} in a program can allocate symmetric, remotely accessible memory blocks. When no action is performed, these - routines return without performing a barrier. Otherwise, the routine will call a procedure that is semantically equivalent to \FUNC{shmem\_barrier\_all} on exit. - This ensures that all \acp{PE} participate in the memory allocation, and that the memory on other - \acp{PE} can be used as soon as the local \ac{PE} returns. The implicit barrier performed by this routine will quiet the - default context. It is the user's responsibility to ensure that no communication operations involving the given memory block are pending on - other contexts prior to calling the \FUNC{shmem\_free} and \FUNC{shmem\_realloc} routines. - The user is also responsible for calling these routines with identical argument(s) on all + routines return without performing a barrier. Otherwise, the routine will call a procedure that is + semantically equivalent to \FUNC{shmem\_barrier\_all} on exit, similar to the behavior of \FUNC{shmem\_malloc}. + The user is responsible for calling this routine with identical argument(s) on all \acp{PE}; if differing \VAR{size}, or \VAR{hints} arguments are used, the behavior of the call - and any subsequent \openshmem calls is undefined. + is undefined. } \apireturnvalues{ diff --git a/content/shmem_p.tex b/content/shmem_p.tex index 71e5594b..6aebc504 100644 --- a/content/shmem_p.tex +++ b/content/shmem_p.tex @@ -24,13 +24,14 @@ The type of \dest{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{value}{The value to be transferred to \VAR{dest}. The type of \VAR{value} should match that implied in the SYNOPSIS section.} - \apiargument{IN}{pe}{The number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ These routines provide a very low latency put capability for single elements of most basic types. - + As with \FUNC{shmem\_put}, these routines start the remote transfer and may return before the data is delivered to the remote \ac{PE}. Use \FUNC{shmem\_quiet} to force completion of all remote \PUT{} transfers. diff --git a/content/shmem_pcontrol.tex b/content/shmem_pcontrol.tex index a20f61e4..d4aac2b3 100644 --- a/content/shmem_pcontrol.tex +++ b/content/shmem_pcontrol.tex @@ -15,7 +15,7 @@ \end{apiarguments} \apidescription{ - \FUNC{shmem\_pcontrol} sets the profiling level and any other + \FUNC{shmem\_pcontrol} sets the profiling level and any other library defined effects through additional arguments. \openshmem libraries make no use of this routine and simply return immediately to the user code. } @@ -25,26 +25,26 @@ } \apinotes{ - Since \openshmem has no control of the implementation of the profiling code, - it is impossible to precisely specify the semantics that will be provided by - calls to \FUNC{shmem\_pcontrol}. This vagueness extends to the number of - arguments to the function and their datatypes. However, to provide some - level of portability of user codes to different profiling libraries, the + Since \openshmem has no control of the implementation of the profiling code, + it is impossible to precisely specify the semantics that will be provided by + calls to \FUNC{shmem\_pcontrol}. This vagueness extends to the number of + arguments to the function and their datatypes. However, to provide some + level of portability of user code to different profiling libraries, the following \VAR{level} values are recommended. \begin{itemize} \item \texttt{level <= 0} Profiling is disabled. \item \texttt{level == 1} Profiling is enabled at the default level of detail. - \item \texttt{level == 2} Profiling is enabled and profile buffers are + \item \texttt{level == 2} Profiling is enabled and profile buffers are flushed if available. - \item \texttt{level > 2} Profiling is enabled with profile library defined + \item \texttt{level > 2} Profiling is enabled with profile library defined effects and additional arguments. \end{itemize} - The default state after \FUNC{shmem\_init} is recommended to have profiling + The default state after \FUNC{shmem\_init} is recommended to have profiling enabled at the default level of detail (\texttt{level == 1}). This allows users - to link with a profiling library and to obtain profile output without - having to modify the user-level source code. + to link with a profiling library and to obtain profile output without + having to modify the user-level source code. } \end{apidefinition} diff --git a/content/shmem_pe_quiet.tex b/content/shmem_pe_quiet.tex index 72ff2963..f0336c20 100644 --- a/content/shmem_pe_quiet.tex +++ b/content/shmem_pe_quiet.tex @@ -1,7 +1,7 @@ \apisummary{ - Waits for completion of all outstanding memory store, blocking - \PUT{}, \ac{AMO}, and \emph{put-with-signal}, as well as - nonblocking \PUT{}, \emph{put-with-signal}, and \GET{} routines + Waits for completion of all outstanding memory store, blocking + \PUT{}, \ac{AMO}, and \emph{put-with-signal}, as well as + nonblocking \PUT{}, \emph{put-with-signal}, and \GET{} routines to symmetric data objects issued by the calling \ac{PE} at the target \acp{PE}. } @@ -17,22 +17,22 @@ \apiargument{IN}{ctx}{A context handle specifying the context on which to perform the operation. When this argument is not provided, the operation is performed on the default context.} - \apiargument{IN}{target\_pes}{Address of target \ac{PE} array where the + \apiargument{IN}{target\_pes}{Address of target \ac{PE} array where the operations need to be completed} - \apiargument{IN}{npes}{The number of \acp{PE} in the target \ac{PE} array} + \apiargument{IN}{npes}{The number of \acp{PE} in the target \ac{PE} array} \end{apiarguments} \apidescription{ - The \FUNC{shmem\_pe\_quiet} ensures completion of memory store, blocking + The \FUNC{shmem\_pe\_quiet} ensures completion of memory store, blocking \PUT{}, \ac{AMO}, and \emph{put-with-signal}, as well as nonblocking \PUT{}, \emph{put-with-signal}, and \GET{} routines on the symmetric data objects issued by the calling \ac{PE} to the target \acp{PE} and on the given context. If \VAR{npes} is set to 0, the \VAR{target\_pes} is ignored and the routine returns immediately. - - The completion and visibility semantics of these operations are the same as the - \FUNC{shmem\_quiet} routine. However, it applies only to the target + + The completion and visibility semantics of these operations are the same as the + \FUNC{shmem\_quiet} routine. However, it applies only to the target \acp{PE}, i.e., the operations to the target \acp{PE} are guaranteed to be complete and visible to all \acp{PE} when \FUNC{shmem\_pe\_quiet} returns. } diff --git a/content/shmem_ptr.tex b/content/shmem_ptr.tex index f5c4d7e9..cc76570b 100644 --- a/content/shmem_ptr.tex +++ b/content/shmem_ptr.tex @@ -23,8 +23,8 @@ to a remotely accessible data object. Providing this address to an argument of an \openshmem routine that requires a symmetric address results in undefined behavior. - - The \FUNC{shmem\_ptr} routine can provide an efficient means to accomplish + + The \FUNC{shmem\_ptr} routine can provide efficient means to accomplish communication, for example when a sequence of reads and writes to a data object on a remote \ac{PE} does not match the access pattern provided in an \openshmem data transfer routine like \FUNC{shmem\_put} or diff --git a/content/shmem_put.tex b/content/shmem_put.tex index f71355a8..ce5c3c22 100644 --- a/content/shmem_put.tex +++ b/content/shmem_put.tex @@ -38,7 +38,8 @@ The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{nelems}{Number of elements in the \VAR{dest} and \VAR{source} arrays. For \FUNC{shmem\_putmem} and \FUNC{shmem\_ctx\_putmem}, elements are bytes.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_put_nbi.tex b/content/shmem_put_nbi.tex index 0c42d7c8..0f7c385d 100644 --- a/content/shmem_put_nbi.tex +++ b/content/shmem_put_nbi.tex @@ -39,7 +39,8 @@ \apiargument{IN}{nelems}{Number of elements in the \VAR{dest} and \VAR{source} arrays. For \FUNC{shmem\_putmem\_nbi} and \FUNC{shmem\_ctx\_putmem\_nbi}, elements are bytes.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_put_signal.tex b/content/shmem_put_signal.tex index 7fd33a8e..79912934 100644 --- a/content/shmem_put_signal.tex +++ b/content/shmem_put_signal.tex @@ -48,7 +48,8 @@ remote \VAR{sig\_addr} signal data object.} \apiargument{IN}{sig\_op}{Signal operator that represents the type of update to be performed on the remote \VAR{sig\_addr} signal data object.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_put_signal_nbi.tex b/content/shmem_put_signal_nbi.tex index 3cbf3625..611d301e 100644 --- a/content/shmem_put_signal_nbi.tex +++ b/content/shmem_put_signal_nbi.tex @@ -48,7 +48,8 @@ remote \VAR{sig\_addr} signal data object.} \apiargument{IN}{sig\_op}{Signal operator that represents the type of update to be performed on the remote \VAR{sig\_addr} signal data object.} - \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE}.} + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_query_initialized.tex b/content/shmem_query_initialized.tex index b3729b7c..b93d9917 100644 --- a/content/shmem_query_initialized.tex +++ b/content/shmem_query_initialized.tex @@ -20,7 +20,7 @@ zero. This function may be called at any time, regardless of the thread safety - level of the \openshmem library. + level or the current initialized state of the \openshmem library. } \apireturnvalues{ diff --git a/content/shmem_reductions.tex b/content/shmem_reductions.tex index ff933b35..e99a12f6 100644 --- a/content/shmem_reductions.tex +++ b/content/shmem_reductions.tex @@ -252,11 +252,11 @@ \subsubsubsection{PROD} contains one element for each separate reduction routine. The type of \source{} should match that implied in the SYNOPSIS section.} \apiargument{IN}{nreduce}{The number of elements in the \dest{} and \source{} - arrays. In teams based \ac{API} calls, \VAR{nreduce} must be of type size\_t. - In deprecated active-set based \ac{API} calls, - \VAR{nreduce} must be of type integer.} + arrays. In teams based \ac{API} calls, \VAR{nreduce} must be of type size\_t.} \begin{DeprecateBlock} +\apiargument{IN}{nreduce}{In active-set based \ac{API} calls, + \VAR{nreduce} must be of type integer.} \apiargument{IN}{PE\_start}{The lowest \ac{PE} number of the active set of \acp{PE}.} \apiargument{IN}{logPE\_stride}{The log (base 2) of the stride between consecutive @@ -273,7 +273,7 @@ \subsubsubsection{PROD} \end{apiarguments} \apidescription{ - \openshmem reduction routines are collective routines over an active set or + \openshmem reduction routines are collective routines over an existing \openshmem team that compute one or more reductions across symmetric arrays on multiple \acp{PE}. A reduction performs an associative binary routine across a set of values. @@ -295,6 +295,41 @@ \subsubsubsection{PROD} If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + Before any \ac{PE} calls a reduction routine, the following conditions + must be ensured, otherwise the behavior is undefined: + \begin{itemize} + \item The \dest{} array on all \acp{PE} in the team is ready to + accept the results of the operation. + \item The \source{} array at the local \ac{PE} is ready to be read by + any \ac{PE} in the team. + \end{itemize} + The application does not need to synchronize to ensure that the \source{} + array is ready across all \acp{PE} prior to calling this routine. + + Upon return from a reduction routine, the following are true for the local + \ac{PE}: + \begin{itemize} + \item The \dest{} array is updated and the \source{} array may be safely reused. + \end{itemize} + +\begin{DeprecateBlock} + \openshmem reduction routines are collective routines over an active set + that compute one or more reductions across symmetric + arrays on multiple \acp{PE}. A reduction performs an associative binary routine + across a set of values. + + The \VAR{nreduce} argument determines the number of separate reductions to + perform. The \source{} array on all \acp{PE} participating in the reduction + provides one element for each reduction. The results of the reductions are placed in the + \dest{} array on all \acp{PE} participating in the reduction. + + The same \source{} and \dest{} arrays must be passed by all PEs that + participate in the collective. + The \source{} and \dest{} arguments must either be the same symmetric + address, or two different symmetric addresses corresponding to buffers that + do not overlap in memory. That is, they must be completely overlapping (sometimes referred to as an ``in place'' reduction) or + completely disjoint. + Active-set-based sync routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. @@ -319,7 +354,7 @@ \subsubsubsection{PROD} \openshmem routine. \end{itemize} Otherwise, the behavior is undefined. - + Upon return from a reduction routine, the following are true for the local \ac{PE}: \begin{itemize} @@ -327,6 +362,7 @@ \subsubsubsection{PROD} \item If using active-set-based routines, the values in the \VAR{pSync} array are restored to the original values. \end{itemize} +\end{DeprecateBlock} The complex-typed interfaces are only provided for sum and product reductions. When the \Cstd translation environment does not support complex types diff --git a/content/shmem_scan.tex b/content/shmem_scan.tex index 618a51a0..b50cdb68 100644 --- a/content/shmem_scan.tex +++ b/content/shmem_scan.tex @@ -6,16 +6,16 @@ %% C11 \begin{C11synopsis} -int @\FuncDecl{shmem\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); -int @\FuncDecl{shmem\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); +int @\FuncDecl{shmem\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems); +int @\FuncDecl{shmem\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems); \end{C11synopsis} where \TYPE{} is one of the integer, real, or complex types supported for the SUM operation as specified by Table \ref{teamreducetypes}. %% C/C++ \begin{Csynopsis} -int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); -int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems); +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems); \end{Csynopsis} where \TYPE{} is one of the integer, real, or complex types supported for the SUM operation and has a corresponding \TYPENAME{} as specified @@ -26,17 +26,17 @@ The team over which to perform the operation. } \apiargument{OUT}{dest}{ - Symmetric address of an array, of length \VAR{nreduce} elements, - to receive the result of the scan routines. The type of + Symmetric address of an array, of length \VAR{nelems} elements, + to receive the result of the scan operation. The type of \dest{} should match that implied in the SYNOPSIS section. } \apiargument{IN}{source}{ - Symmetric address of an array, of length \VAR{nreduce} elements, - that contains one element for each separate scan routine. + Symmetric address of an array, of length \VAR{nelems} elements, + that contains one element for each separate scan operation. The type of \source{} should match that implied in the SYNOPSIS section. } - \apiargument{IN}{nreduce}{ + \apiargument{IN}{nelems}{ The number of elements in the \dest{} and \source{} arrays. } \end{apiarguments} @@ -49,7 +49,7 @@ multiple \acp{PE}. The scan operations are performed with the SUM operator. - The \VAR{nreduce} argument determines the number of separate scan + The \VAR{nelems} argument determines the number of separate scan operations to perform. The \source{} array on all \acp{PE} participating in the operation provides one element for each scan. The results of the scan operations are placed in the \dest{} array @@ -57,7 +57,7 @@ The \FUNC{shmem\_sum\_inscan} routine performs an inclusive scan operation, while the \FUNC{shmem\_sum\_exscan} routine performs an - exclusive scan operation. + exclusive scan operation. For \FUNC{shmem\_sum\_inscan}, the value of the $j$-th element in the \VAR{dest} array on \ac{PE}~$i$ is defined as: @@ -75,10 +75,14 @@ \end{cases} \end{equation*} + + The same \source{} and \dest{} arrays must be passed by all PEs that + participate in the collective. The \source{} and \dest{} arguments must either be the same symmetric address, or two different symmetric addresses - corresponding to buffers that do not overlap in memory. That is, - they must be completely overlapping or completely disjoint. + corresponding to buffers that do not overlap in memory. + That is, they must be completely overlapping (sometimes referred to as an + ``in place'' reduction) or completely disjoint. Team-based scan routines operate over all \acp{PE} in the provided team argument. All \acp{PE} in the provided team must participate in @@ -86,10 +90,17 @@ \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. - Before any \ac{PE} calls a scan routine, the \dest{} array on all - \acp{PE} participating in the operation must be ready to accept the - results of the operation. Otherwise, the behavior is undefined. - + Before any \ac{PE} calls a scan routine, the following conditions must be + ensured, otherwise the behavior is undefined: + \begin{itemize} + \item The \dest{} array on all \acp{PE} in the team is ready to accept + the result of the operation. + \item The \source{} array at the local \ac{PE} is ready to be read by + any \ac{PE} in the team. + \end{itemize} + The application does not need to synchronize to ensure that the \source{} + array is ready across all \acp{PE} prior to calling this routine. + Upon return from a scan routine, the following are true for the local \ac{PE}: the \dest{} array is updated, and the \source{} array may be safely reused. diff --git a/content/shmem_signal_add.tex b/content/shmem_signal_add.tex index 272b03a6..a7bb2aa3 100644 --- a/content/shmem_signal_add.tex +++ b/content/shmem_signal_add.tex @@ -5,12 +5,12 @@ \begin{apidefinition} \begin{C11synopsis} -void @\FuncDecl{shmem\_signal\_add}@(shmem_ctx_t ctx, const uint64_t *sig_addr, uint64_t signal, int pe); +void @\FuncDecl{shmem\_signal\_add}@(shmem_ctx_t ctx, uint64_t *sig_addr, uint64_t signal, int pe); \end{C11synopsis} \begin{Csynopsis} -void @\FuncDecl{shmem\_signal\_add}@(const uint64_t *sig_addr, uint64_t signal, int pe); -void @\FuncDecl{shmem\_ctx\_signal\_add}@(shmem_ctx_t ctx, const uint64_t *sig_addr, uint64_t signal, int pe); +void @\FuncDecl{shmem\_signal\_add}@(uint64_t *sig_addr, uint64_t signal, int pe); +void @\FuncDecl{shmem\_ctx\_signal\_add}@(shmem_ctx_t ctx, uint64_t *sig_addr, uint64_t signal, int pe); \end{Csynopsis} \begin{apiarguments} @@ -27,9 +27,8 @@ Unsigned 64-bit value that is used for updating the remote \VAR{sig\_addr} signal data object. } - \apiargument{IN}{pe}{ - \ac{PE} number of the remote \ac{PE}. - } + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_signal_set.tex b/content/shmem_signal_set.tex index 00c70345..b463917a 100644 --- a/content/shmem_signal_set.tex +++ b/content/shmem_signal_set.tex @@ -5,12 +5,12 @@ \begin{apidefinition} \begin{C11synopsis} -void @\FuncDecl{shmem\_signal\_set}@(shmem_ctx_t ctx, const uint64_t *sig_addr, uint64_t signal, int pe); +void @\FuncDecl{shmem\_signal\_set}@(shmem_ctx_t ctx, uint64_t *sig_addr, uint64_t signal, int pe); \end{C11synopsis} \begin{Csynopsis} -void @\FuncDecl{shmem\_signal\_set}@(const uint64_t *sig_addr, uint64_t signal, int pe); -void @\FuncDecl{shmem\_ctx\_signal\_set}@(shmem_ctx_t ctx, const uint64_t *sig_addr, uint64_t signal, int pe); +void @\FuncDecl{shmem\_signal\_set}@(uint64_t *sig_addr, uint64_t signal, int pe); +void @\FuncDecl{shmem\_ctx\_signal\_set}@(shmem_ctx_t ctx, uint64_t *sig_addr, uint64_t signal, int pe); \end{Csynopsis} \begin{apiarguments} @@ -27,9 +27,8 @@ Unsigned 64-bit value that is used for updating the remote \VAR{sig\_addr} signal data object. } - \apiargument{IN}{pe}{ - \ac{PE} number of the remote \ac{PE}. - } + \apiargument{IN}{pe}{\ac{PE} number of the remote \ac{PE} relative to the team associated + with the given \VAR{ctx} when provided, or the default context otherwise.} \end{apiarguments} \apidescription{ diff --git a/content/shmem_signal_wait_until.tex b/content/shmem_signal_wait_until.tex index 5d93ec7f..564c5a77 100644 --- a/content/shmem_signal_wait_until.tex +++ b/content/shmem_signal_wait_until.tex @@ -11,7 +11,7 @@ \begin{apiarguments} -\apiargument{IN}{sig\_addr}{Local address of the source signal variable.} +\apiargument{IN}{sig\_addr}{Local address of the remotely accessible source signal variable.} \apiargument{IN}{cmp}{The comparison operator that compares \VAR{sig\_addr} with \VAR{cmp\_value}.} \apiargument{IN}{cmp\_value}{The value against which the object pointed to diff --git a/content/shmem_sync.tex b/content/shmem_sync.tex index 6e41ee82..91a2ce61 100644 --- a/content/shmem_sync.tex +++ b/content/shmem_sync.tex @@ -1,7 +1,11 @@ \apisummary{ Registers the arrival of a \ac{PE} at a synchronization point. This routine does not return until all other \acp{PE} in a given OpenSHMEM team - or active set arrive at this synchronization point. + arrive at this synchronization point. +\begin{DeprecateBlock} + Registers the arrival of a \ac{PE} at a synchronization point. + This routine does not return until all other \acp{PE} in a given OpenSHMEM active set arrive at this synchronization point. +\end{DeprecateBlock} } \begin{apidefinition} @@ -38,12 +42,12 @@ \apidescription{ \FUNC{shmem\_sync} is a collective synchronization routine over an - existing \openshmem team or active set. + existing \openshmem team. The routine registers the arrival of a \ac{PE} at a synchronization point in the program. This is a fast mechanism for synchronizing all \acp{PE} that participate in this collective call. The routine blocks the calling \ac{PE} until all \acp{PE} in the - specified team or active set have called \FUNC{shmem\_sync}. In a multithreaded \openshmem + specified team have called \FUNC{shmem\_sync}. In a multithreaded \openshmem program, only the calling thread is blocked. Team-based sync routines operate over all \acp{PE} in the provided team argument. All @@ -51,6 +55,19 @@ If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the behavior is undefined. + In contrast with the \FUNC{shmem\_barrier} routine, \FUNC{shmem\_sync} only + ensures completion and visibility of previously issued memory stores and does not ensure + completion of remote memory updates issued via \openshmem routines. + +\begin{DeprecateBlock} + \FUNC{shmem\_sync} is a collective synchronization routine over an active set. + + The routine registers the arrival of a \ac{PE} at a synchronization point in the program. + This is a fast mechanism for synchronizing all \acp{PE} that participate in this + collective call. The routine blocks the calling \ac{PE} until all \acp{PE} in the + active set have called \FUNC{shmem\_sync}. In a multithreaded \openshmem + program, only the calling thread is blocked. + Active-set-based sync routines operate over all \acp{PE} in the active set defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet. @@ -64,12 +81,11 @@ \VAR{PE\_size} must be equal on all \acp{PE} in the active set. The same work array must be passed in \VAR{pSync} to all \acp{PE} in the active set. - In contrast with the \FUNC{shmem\_barrier} routine, \FUNC{shmem\_sync} only - ensures completion and visibility of previously issued memory stores and does not ensure - completion of remote memory updates issued via \openshmem routines. - The same \VAR{pSync} array may be reused on consecutive calls to \FUNC{shmem\_sync} if the same active set is used. +\end{DeprecateBlock} + + } \apireturnvalues{ diff --git a/content/shmem_team_ptr.tex b/content/shmem_team_ptr.tex index af158c31..81deaa92 100644 --- a/content/shmem_team_ptr.tex +++ b/content/shmem_team_ptr.tex @@ -24,8 +24,8 @@ a remotely accessible data object. Providing this address to an argument of an \openshmem routine that requires a symmetric address results in undefined behavior. - - The \FUNC{shmem\_team\_ptr} routine can provide an efficient means to accomplish + + The \FUNC{shmem\_team\_ptr} routine can provide efficient means to accomplish communication, for example when a sequence of reads and writes to a data object on a remote \ac{PE} does not match the access pattern provided in an \openshmem data transfer routine like \FUNC{shmem\_put} or diff --git a/content/shmem_team_split_strided.tex b/content/shmem_team_split_strided.tex index 08969792..a211c17e 100644 --- a/content/shmem_team_split_strided.tex +++ b/content/shmem_team_split_strided.tex @@ -13,8 +13,7 @@ \begin{apiarguments} \apiargument{IN}{parent\_team}{An \openshmem team.} -\apiargument{IN}{start}{The lowest \ac{PE} number of the subset of \acp{PE} from -the parent team that will form the new team.} +\apiargument{IN}{start}{The first \acs{PE} number of the subset of \acp{PE} from the parent team that will form the new team. If the stride is less than zero, the first \acs{PE} number is the highest \acs{PE} of the parent team; if it is greater than zero, it is the lowest; if the stride is zero, it is the starting \acs{PE}.} \apiargument{IN}{stride}{The stride between team \ac{PE} numbers in the parent team that comprise the subset of \acp{PE} that will form @@ -59,6 +58,18 @@ relative order with respect to the parent team. If a $stride$ value equal to 0 is passed to \FUNC{shmem\_team\_split\_strided}, then the $size$ argument passed must be 1, or the behavior is undefined. +If the triplet provided to \FUNC{shmem\_team\_split\_strided} implies a +wrap-around sequence, the input is considered invalid and the behavior is +undefined. +In other words, when $stride$ is nonzero, a newly created team must only +include \acp{PE} whose subsequent parent \ac{PE} values are either all +increasing (for positive $stride$) or all decreasing (for negative +$stride$). +That is, \textit{wrap-around} with respect to the parent team's \ac{PE} values +is not permitted. +For example, the list of \acp{PE} in the parent team should not start at a high +number and then continue to include \acp{PE} in the lower end of the parent +team's \ac{PE} range. This routine must be called by all \acp{PE} in the parent team. All \acp{PE} must provide the same values for the \ac{PE} triplet. @@ -101,11 +112,8 @@ } \apinotes{ - The \FUNC{shmem\_team\_split\_strided} operation uses an arbitrary - \VAR{stride} argument, whereas the \VAR{logPE\_stride} argument to the - active set collective operations only permits strides that are a power of two. - Arbitrary strides allow a greater number of PE subsets to be expressed - and can support a broader range of usage models. + The \FUNC{shmem\_team\_split\_strided} operation can take any integer value + \VAR{stride} argument. See the description of team handles and predefined teams in Section~\ref{subsec:team} for more information about team handle semantics and usage. diff --git a/content/shmem_test_all_vector.tex b/content/shmem_test_all_vector.tex index bad49c3a..429e4366 100644 --- a/content/shmem_test_all_vector.tex +++ b/content/shmem_test_all_vector.tex @@ -46,8 +46,7 @@ conditions. This routine compares each element of the \VAR{ivars} array in the test set with each respective value in \VAR{cmp\_values} according to the comparison operator \VAR{cmp} at the - calling \ac{PE}. If \VAR{nelems} is 0, the test set is empty and this - routine returns 1. + calling \ac{PE}. The optional \VAR{status} is a mask array of length \VAR{nelems} where each element corresponds to the respective element in \VAR{ivars} and indicates whether diff --git a/content/shmem_wait_until_all.tex b/content/shmem_wait_until_all.tex index d4dfd4b4..4f1c8c15 100644 --- a/content/shmem_wait_until_all.tex +++ b/content/shmem_wait_until_all.tex @@ -33,7 +33,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_wait\_until\_all} routine waits until all entries in the wait set specified by \VAR{ivars} and \VAR{status} have satisfied the wait condition at the calling \ac{PE}. The \VAR{ivars} objects at the calling \ac{PE} may be diff --git a/content/shmem_wait_until_all_vector.tex b/content/shmem_wait_until_all_vector.tex index a3abdf9c..b8006f1c 100644 --- a/content/shmem_wait_until_all_vector.tex +++ b/content/shmem_wait_until_all_vector.tex @@ -34,7 +34,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_wait\_until\_all\_vector} routine waits until all entries in the wait set specified by \VAR{ivars} and \VAR{status} have satisfied the wait conditions at the calling \ac{PE}. The \VAR{ivars} diff --git a/content/shmem_wait_until_any.tex b/content/shmem_wait_until_any.tex index e94e9afb..ce119eca 100644 --- a/content/shmem_wait_until_any.tex +++ b/content/shmem_wait_until_any.tex @@ -34,7 +34,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_wait\_until\_any} routine waits until any one entry in the wait set specified by \VAR{ivars} and \VAR{status} satisfies the wait condition at the calling \ac{PE}. The \VAR{ivars} objects at the calling diff --git a/content/shmem_wait_until_any_vector.tex b/content/shmem_wait_until_any_vector.tex index 09bcc5c7..30ebd077 100644 --- a/content/shmem_wait_until_any_vector.tex +++ b/content/shmem_wait_until_any_vector.tex @@ -35,7 +35,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_wait\_until\_any\_vector} routine waits until any one entry in the wait set specified by \VAR{ivars} and \VAR{status} satisfies the wait condition at the calling \ac{PE}. The \VAR{ivars} objects at the diff --git a/content/shmem_wait_until_some.tex b/content/shmem_wait_until_some.tex index 9af90fbb..8bcf4975 100644 --- a/content/shmem_wait_until_some.tex +++ b/content/shmem_wait_until_some.tex @@ -36,7 +36,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_wait\_until\_some} routine waits until at least one entry in the wait set specified by \VAR{ivars} and \VAR{status} satisfies the wait condition at the calling \ac{PE}. The \VAR{ivars} objects at the diff --git a/content/shmem_wait_until_some_vector.tex b/content/shmem_wait_until_some_vector.tex index e3a414fb..9b219cf4 100644 --- a/content/shmem_wait_until_some_vector.tex +++ b/content/shmem_wait_until_some_vector.tex @@ -37,7 +37,7 @@ \end{apiarguments} -\apidescription{ +\apidescription{ The \FUNC{shmem\_wait\_until\_some\_vector} routine waits until at least one entry in the wait set specified by \VAR{ivars} and \VAR{status} satisfies the wait condition at the calling \ac{PE}. diff --git a/content/signaling.tex b/content/signaling.tex new file mode 100644 index 00000000..bd04940b --- /dev/null +++ b/content/signaling.tex @@ -0,0 +1,57 @@ +This section specifies the OpenSHMEM support for \OPR{put-with-signal}, +nonblocking \OPR{put-with-signal}, and \OPR{signal-\{add, fetch, set\}} routines. The +put-with-signal routines provide a method for copying data from a contiguous +local data object to a data object on a specified \ac{PE} and subsequently +updating a remote flag to signal completion. +The signal-add and signal-set routines provide methods for updating +the signal object without the associated data transfer of a +put-with-signal operation. +The signal-fetch routine provides support for reading a local signal value. + +\openshmem \OPR{put-with-signal} and \OPR{signal-\{add, set\}} +routines specified in this section have two +variants. In one of the variants, the context handle, \VAR{ctx}, is explicitly +passed as an argument. In this variant, the operation is performed on the +specified context. If the context handle \VAR{ctx} does not correspond to a +valid context, the behavior is undefined. In the other variant, the context +handle is not explicitly passed and thus, the operations are performed on the +default context. + +\subsubsection{Atomicity Guarantees for Signaling Operations} +\label{subsec:signal_atomicity} +All signaling operations put-with-signal, nonblocking put-with-signal, and +signal-\{add, fetch, set\} are performed on a signal data object, a remotely accessible +symmetric object of type \VAR{uint64\_t}. A signal operator in the +put-with-signal routine is an \openshmem library constant that determines the +type of update to be performed as a signal on the signal data object. + +All signaling operations on the signal data object complete as if performed +atomically with respect to the following: +\begin{itemize} + \item other blocking or nonblocking variant of the put-with-signal routine + that updates the signal data object using the same signal update operator; + \item signal-add routine when the put-with-signal routine uses the + \LibConstRef{SHMEM\_SIGNAL\_ADD} signal operator; + \item signal-set routine when the put-with-signal routine uses the + \LibConstRef{SHMEM\_SIGNAL\_SET} signal operator; + \item signal-fetch routine that fetches the signal data object; and + \item any point-to-point synchronization routine that accesses the signal + data object. +\end{itemize} + +\subsubsection{Available Signal Operators} +\label{subsec:signal_operator} + +With the atomicity guarantees as described in +Section~\ref{subsec:signal_atomicity}, the following options can be used as a +signal operator. + + \apitablerow{\LibConstRef{SHMEM\_SIGNAL\_SET}}{An update to signal data + object is an atomic set operation. It writes an unsigned 64-bit value as a + signal into the signal data object on a remote \VAR{PE} as an atomic + operation.} + + \apitablerow{\LibConstRef{SHMEM\_SIGNAL\_ADD}}{An update to signal data + object is an atomic add operation. It adds an unsigned 64-bit value as a + signal into the signal data object on a remote \VAR{PE} as an atomic + operation.} diff --git a/content/the_openshmem_effort.tex b/content/the_openshmem_effort.tex index f83d3188..a3321cb5 100644 --- a/content/the_openshmem_effort.tex +++ b/content/the_openshmem_effort.tex @@ -9,7 +9,7 @@ code. This ensures that programs can run on multiple platforms without having to deal with subtle vendor-specific implementation differences. For more details on the history of \openshmem please refer to the -\hyperref[sec:openshmem_history]{History of \openshmem} section. +\hyperref[sec:openshmem_history]{History of \openshmem} section. The \openshmem\footnote{The \openshmem specification is owned by Open Source Software Solutions Inc., a nonprofit organization, under an agreement with diff --git a/content/threads_intro.tex b/content/threads_intro.tex index 59f134d0..3aa329c6 100644 --- a/content/threads_intro.tex +++ b/content/threads_intro.tex @@ -30,7 +30,7 @@ \begin{enumerate} \item In the \CONST{SHMEM\_THREAD\_FUNNELED}, \CONST{SHMEM\_THREAD\_SERIALIZED}, and -\CONST{SHMEM\_THREAD\_MULTIPLE} thread levels, the \FUNC{shmem\_init} and +\CONST{SHMEM\_THREAD\_MULTIPLE} thread levels, the \FUNC{shmem\_init\_thread} and \FUNC{shmem\_finalize} calls must be invoked by the same thread. \item diff --git a/example_code/amo_scenario_3.c b/example_code/amo_scenario_3.c index 93586779..2091b09f 100644 --- a/example_code/amo_scenario_3.c +++ b/example_code/amo_scenario_3.c @@ -1,19 +1,14 @@ #include int main(void) { - static long psync[SHMEM_REDUCE_SYNC_SIZE]; - static int pwrk[SHMEM_REDUCE_MIN_WRKDATA_SIZE]; static int x = 0, y = 0; - for (int i = 0; i < SHMEM_REDUCE_SYNC_SIZE; i++) - psync[i] = SHMEM_SYNC_VALUE; - shmem_init(); shmem_int_atomic_inc(&x, (shmem_my_pe() + 1) % shmem_n_pes()); /* Undefined behavior: The following reduction operation performs accesses to * symmetric variable 'x' that are concurrent with previously issued atomic * increment operations on the same variable. */ - shmem_int_sum_to_all(&y, &x, 1, 0, 0, shmem_n_pes(), pwrk, psync); + shmem_int_sum_reduce(SHMEM_TEAM_WORLD, &y, &x, 1); shmem_finalize(); return 0; diff --git a/example_code/shmem_ctx.c b/example_code/shmem_ctx.c index b122e874..15b3f76b 100644 --- a/example_code/shmem_ctx.c +++ b/example_code/shmem_ctx.c @@ -1,9 +1,6 @@ #include #include -long pwrk[SHMEM_REDUCE_MIN_WRKDATA_SIZE]; -long psync[SHMEM_REDUCE_SYNC_SIZE]; - long task_cntr = 0; /* Next task counter */ long tasks_done = 0; /* Tasks done by this PE */ long total_done = 0; /* Total tasks done by all PEs */ @@ -12,9 +9,6 @@ int main(void) { int tl, i; long ntasks = 1024; /* Total tasks per PE */ - for (i = 0; i < SHMEM_REDUCE_SYNC_SIZE; i++) - psync[i] = SHMEM_SYNC_VALUE; - shmem_init_thread(SHMEM_THREAD_MULTIPLE, &tl); if (tl != SHMEM_THREAD_MULTIPLE) shmem_global_exit(1); @@ -49,7 +43,7 @@ int main(void) { shmem_ctx_destroy(ctx); } - shmem_long_sum_to_all(&total_done, &tasks_done, 1, 0, 0, npes, pwrk, psync); + shmem_long_sum_reduce(SHMEM_TEAM_WORLD, &total_done, &tasks_done, 1); int result = (total_done != ntasks * npes); shmem_finalize(); diff --git a/main_spec.tex b/main_spec.tex index cfc3f8ae..6698ad6b 100644 --- a/main_spec.tex +++ b/main_spec.tex @@ -41,8 +41,8 @@ \section{Environment Variables }\label{subsec:environment_variables} \section{OpenSHMEM Library \acs{API}}\label{sec:openshmem_library_api} \subsection{Library Setup, Exit, and Query Routines} -The library setup and query interfaces that initialize and monitor the parallel -environment of the \acp{PE}. +This section specifies the library setup, exit, and query interfaces that initialize, +finalize, and monitor the parallel environment of the \acp{PE}, respectively. \subsubsection{\textbf{SHMEM\_INIT}}\label{subsec:shmem_init} \input{content/shmem_init} @@ -56,6 +56,10 @@ \subsubsection{\textbf{SHMEM\_N\_PES}}\label{subsec:shmem_n_pes} \subsubsection{\textbf{SHMEM\_FINALIZE}}\label{subsec:shmem_finalize} \input{content/shmem_finalize} +\subsubsection{\textbf{SHMEM\_QUERY\_INITIALIZED}} +\label{subsec:shmem_query_initialized} +\input{content/shmem_query_initialized} + \subsubsection{\textbf{SHMEM\_GLOBAL\_EXIT}}\label{subsec:shmem_global_exit} \input{content/shmem_global_exit} @@ -92,11 +96,6 @@ \subsubsection{\textbf{SHMEM\_QUERY\_THREAD}} \label{subsec:shmem_query_thread} \input{content/shmem_query_thread} -\subsubsection{\textbf{SHMEM\_QUERY\_INITIALIZED}} -\label{subsec:shmem_query_initialized} -\input{content/shmem_query_initialized} - - \subsection{Memory Management Routines} \label{sec:memory_management} \input{content/memmgmt_intro.tex} @@ -306,64 +305,7 @@ \subsubsubsection{\textbf{SHMEM\_ATOMIC\_FETCH\_XOR\_NBI}} \subsection{Signaling Operations}\label{sec:shmem_signal} -This section specifies the OpenSHMEM support for \OPR{put-with-signal}, -nonblocking \OPR{put-with-signal}, and \OPR{signal-\{add, fetch, set\}} routines. The -put-with-signal routines provide a method for copying data from a contiguous -local data object to a data object on a specified \ac{PE} and subsequently -updating a remote flag to signal completion. -The signal-add and signal-set routines provide methods for updating -the signal object without the associated data transfer of a -put-with-signal operation. -The signal-fetch routine provides support for reading a local signal value. - -\openshmem \OPR{put-with-signal} and \OPR{signal-\{add, set\}} -routines specified in this section have two -variants. In one of the variants, the context handle, \VAR{ctx}, is explicitly -passed as an argument. In this variant, the operation is performed on the -specified context. If the context handle \VAR{ctx} does not correspond to a -valid context, the behavior is undefined. In the other variant, the context -handle is not explicitly passed and thus, the operations are performed on the -default context. - -\subsubsection{Atomicity Guarantees for Signaling Operations} -\label{subsec:signal_atomicity} -All signaling operations put-with-signal, nonblocking put-with-signal, and -signal-\{add, fetch, set\} are performed on a signal data object, a remotely accessible -symmetric object of type \VAR{uint64\_t}. A signal operator in the -put-with-signal routine is an \openshmem library constant that determines the -type of update to be performed as a signal on the signal data object. - -All signaling operations on the signal data object complete as if performed -atomically with respect to the following: -\begin{itemize} - \item other blocking or nonblocking variant of the put-with-signal routine - that updates the signal data object using the same signal update operator; - \item signal-add routine when the put-with-signal routine uses the - \LibConstRef{SHMEM\_SIGNAL\_ADD} signal operator; - \item signal-set routine when the put-with-signal routine uses the - \LibConstRef{SHMEM\_SIGNAL\_SET} signal operator; - \item signal-fetch routine that fetches the signal data object; and - \item any point-to-point synchronization routine that accesses the signal - data object. -\end{itemize} - -\subsubsection{Available Signal Operators} -\label{subsec:signal_operator} - -With the atomicity guarantees as described in -Section~\ref{subsec:signal_atomicity}, the following options can be used as a -signal operator. - - \apitablerow{\LibConstRef{SHMEM\_SIGNAL\_SET}}{An update to signal data - object is an atomic set operation. It writes an unsigned 64-bit value as a - signal into the signal data object on a remote \VAR{PE} as an atomic - operation.} - - \apitablerow{\LibConstRef{SHMEM\_SIGNAL\_ADD}}{An update to signal data - object is an atomic add operation. It adds an unsigned 64-bit value as a - signal into the signal data object on a remote \VAR{PE} as an atomic - operation.} - +\input{content/signaling.tex} \subsubsection{\textbf{SHMEM\_PUT\_SIGNAL}}\label{subsec:shmem_put_signal} \input{content/shmem_put_signal.tex} diff --git a/utils/defs.tex b/utils/defs.tex index 771ba8a7..fa2d0d99 100644 --- a/utils/defs.tex +++ b/utils/defs.tex @@ -34,7 +34,7 @@ \newcommand{\newtext}[1]{\textcolor{ForestGreen}{#1}} \newcommand{\oldtext}[1]{\textcolor{magenta}{\sout{#1}}} -\newcommand{\insertDocVersion}{1.5} +\newcommand{\insertDocVersion}{1.6} \newcommand{\openshmem}[1][]{% {Open\-SHMEM\ifthenelse{\equal{#1}{}}{}{~#1}}\xspace} \newcommand{\HEADER}[1]{\textit{#1}} @@ -105,7 +105,7 @@ \acro{API}{\emph{Application Programming Interface}} \acro{MPI}{\emph{Message Passing Interface}} \acro{SPMD}{\emph{Single Program Multiple Data}} -\acro{ANL}{Argonne National Labratory} +\acro{ANL}{Argonne National Laboratory} \acro{ARL}{Army Research Laboratory} \acro{AMD}{Advanced Micro Devices} \acro{MPMD}{\emph{Multiple Program Multiple Data}} @@ -120,7 +120,7 @@ \acro{SGI}{Silicon Graphics International} \acro{DoD}{U.S. Department of Defense} \acro{SBU}{Stonybrook University} -\acro{UTK}{University of Tenneesee at Knoxville} +\acro{UTK}{University of Tennessee at Knoxville} \acro{HPE}{Hewlett Packard Enterprise} \end{acronym} @@ -362,8 +362,7 @@ \hfill \item[Return Values] \hfill \\ #1 -\\ -\hfill +\hfill \\ } \newcommand{\apitablerow}[2]{