Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SHMEM 1.6 Sec 9.10] Collectives section committee changes #535

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6072be8
Deprecate blocks for Collect, Broadcast, update def apireturnvalues
kwaters4 Mar 20, 2024
14c87b8
Deprecate active-set language in Collectives, missing Reductions
kwaters4 Mar 20, 2024
8997a4e
Reductions, Programming Model, strided teams active set langauge depr…
kwaters4 Mar 28, 2024
9fa187b
Indent in shmem_alltoall
kwaters4 Mar 28, 2024
111a29c
Update content/collective_intro.tex
kwaters4 Apr 26, 2024
dda26c6
Update content/shmem_broadcast.tex
kwaters4 Apr 26, 2024
45e710b
Update shmem_reductions.tex
kwaters4 Apr 26, 2024
41a0024
Update shmem_team_split_strided API Note, arbirary to any positive in…
kwaters4 Jul 26, 2024
a6532ef
Fix Whitespace in shmem_alltoall
kwaters4 Jul 26, 2024
c3b23e5
Fix whitespace shmem_broadcast
kwaters4 Jul 26, 2024
a3b9ea7
Edit Whitespace in shmem_collect
kwaters4 Jul 26, 2024
a621dd4
Fix Whitespace in collective_intro
kwaters4 Jul 26, 2024
a2d9daa
Fix Typo in shmem_alltoall
kwaters4 Jul 26, 2024
9a6a048
Merge branch 'openshmem-org:master' into master
kwaters4 Aug 29, 2024
fd98952
Update content/shmem_team_split_strided.tex
kwaters4 Aug 29, 2024
a1e23bd
scan: 488 section committee edits (nelems/overlap)
davidozog Aug 29, 2024
129573e
Update content/shmem_broadcast.tex
kwaters4 Aug 30, 2024
421cc8b
Remove active language in reduction api args
kwaters4 Aug 30, 2024
6573d12
Merge pull request #8 from kwaters4/dep_active_lang
kwaters4 Aug 30, 2024
0b47caf
collectives: clarify src buffer entry requirements
davidozog Aug 30, 2024
de1315d
Remove unnecessary new line
maawad Aug 30, 2024
f1216b9
Merge pull request #11 from davidozog/pr/remove-newline-after-comma
davidozog Aug 30, 2024
6a6ce3f
Merge pull request #9 from davidozog/pr/scan_edits
davidozog Aug 30, 2024
8095ea4
collectives: "array" instead of source "buffer"
davidozog Aug 30, 2024
00213da
Merge pull request #10 from davidozog/pr/src_buffer_readiness
davidozog Aug 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions content/collective_intro.tex
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
\emph{Collective routines} are defined as coordinated communication or synchronization
operations performed by a group of \acp{PE}.

\openshmem provides three types of collective routines:
\openshmem provides four types of collective routines:

\begin{enumerate}
\item Collective routines that operate on teams use a team handle parameter to determine
which \acp{PE} will participate in the routine, and use resources encapsulated by the team object
to perform operations. See Section~\ref{subsec:team} for details on team management.
\item Collective routines that operate on teams use a team handle parameter to determine
which \acp{PE} will participate in the routine, and use resources encapsulated by the team object
to perform operations. See Section~\ref{subsec:team} for details on team management.

\begin{DeprecateBlock}
\item Collective routines that operate on active sets use a set of parameters to determine
which \acp{PE} will participate and what resources are used to perform operations.
\end{DeprecateBlock}
\begin{DeprecateBlock}
\item Collective routines that operate on active sets use a set of parameters to determine
which \acp{PE} will participate and what resources are used to perform operations.

\item Collective routines that do not accept active set
parameters and, as required, the default context.
\end{DeprecateBlock}

\item Collective routines that accept neither team nor active set
parameters, which implicitly operate on the world team and, as
required, the default context.
\item Collective routines that do not accept team
parameters, which implicitly operate on the world team and, as
required, the default context.
\end{enumerate}

Concurrent accesses to symmetric memory by an \openshmem collective
Expand Down
2 changes: 1 addition & 1 deletion content/programming_model_overview.tex
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@
data object on another symmetric data object.
\item \OPR{All-to-All}: All \acp{PE} participating in the routine exchange
a fixed amount of contiguous or strided data with all other \acp{PE}
in the active set.
in the team.
davidozog marked this conversation as resolved.
Show resolved Hide resolved
\end{enumerate}

\item \textbf{Mutual Exclusion}
Expand Down
56 changes: 38 additions & 18 deletions content/shmem_alltoall.tex
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,17 @@

\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
the combined total of \VAR{nelems} elements from each \ac{PE} in the
active set.
participating \acp{PE}.
The type of \dest{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
elements of data for each \ac{PE} in the active set, ordered according to
elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to
destination \ac{PE}.
The type of \source{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{nelems}{
The number of elements to exchange for each \ac{PE}.
For \FUNC{shmem\_alltoallmem}, elements are bytes;
for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes,
respectively.
The number of elements to exchange for each \ac{PE}.
For \FUNC{shmem\_alltoallmem}, elements are bytes;
for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes,
respectively.
}

\begin{DeprecateBlock}
Expand Down Expand Up @@ -89,9 +89,7 @@
Given a \ac{PE} \VAR{i} that is the \kth \ac{PE}
participating in the operation and a \ac{PE}
\VAR{j} that is the \lth \ac{PE}
participating in the operation,

\ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to
participating in the operation, \ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to
the \kth block of
the \VAR{dest} object of \ac{PE} \VAR{j}.

Expand All @@ -100,6 +98,25 @@
If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.

Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine, the following
conditions must be ensured, otherwise the behavior is undefined:
\begin{itemize}
\item The \dest{} array on all \acp{PE} in the team is ready to
accept the result of the operation.
\item The \source{} array at the local \ac{PE} is ready to be
read by any \ac{PE} in the team.
\end{itemize}
The application does not need to synchronize to ensure that the \source{}
array is ready across all \acp{PE} prior to calling this routine.

Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for
the local PE:
\begin{itemize}
\item Its \VAR{dest} symmetric data object is completely updated and the
data has been copied out of the source data object.
\end{itemize}

\begin{DeprecateBlock}
Active-set-based collective routines operate over all \acp{PE} in the active set
defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet.

Expand All @@ -116,23 +133,26 @@

Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine,
the following conditions must be ensured:

\begin{itemize}
\item The \VAR{dest} data object on all \acp{PE} in the active set is
ready to accept the \FUNC{shmem\_alltoall} data.
\item For active-set-based routines, the \VAR{pSync} array
on all \acp{PE} in the active set is not still in use from a prior call
to a \FUNC{shmem\_alltoall} routine.
\item The \VAR{dest} data object on all \acp{PE} in the active set is
ready to accept the \FUNC{shmem\_alltoall} data.
\item For active-set-based routines, the \VAR{pSync} array
on all \acp{PE} in the active set is not still in use from a prior call
to a \FUNC{shmem\_alltoall} routine.
\end{itemize}

Otherwise, the behavior is undefined.

Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for
the local PE:
\begin{itemize}
\item Its \VAR{dest} symmetric data object is completely updated and
the data has been copied out of the \VAR{source} data object.
\item For active-set-based routines,
the values in the \VAR{pSync} array are restored to the original values.
\item Its \VAR{dest} symmetric data object is completely updated and the
data has been copied out of the source data object.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data has been copied out of the source data object.
data has been copied out of the \VAR{source} data object.

\item For active-set-based routines,
the values in the \VAR{pSync} array are restored to the original values.
\end{itemize}
\end{DeprecateBlock}
}

\apireturnvalues{
Expand Down
4 changes: 2 additions & 2 deletions content/shmem_alltoalls.tex
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@

\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
the combined total of \VAR{nelems} elements from each \ac{PE} in the
active set.
participating \acp{PE}.
The type of \dest{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
elements of data for each \ac{PE} in the active set, ordered according to
elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to
destination \ac{PE}.
The type of \source{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{dst}{The stride between consecutive elements of the \dest{}
Expand Down
83 changes: 54 additions & 29 deletions content/shmem_broadcast.tex
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
respectively.
}
\apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to
the team or active set, from which the data is copied.}
the calling PEs, from which the data is copied.}

\begin{DeprecateBlock}

Expand All @@ -61,8 +61,7 @@
\end{apiarguments}

\apidescription{
\openshmem broadcast routines are collective routines over an active set or
valid \openshmem team.
\openshmem team-based broadcast routines are collective routines over a valid \openshmem team.
They copy the \source{} data object on the \ac{PE} specified by
\VAR{PE\_root} to the \dest{} data object on the \acp{PE}
participating in the collective operation.
Expand All @@ -75,66 +74,92 @@
\item The \dest{} object is updated on all \acp{PE}.
\item All \acp{PE} in the \VAR{team} argument must participate in
the operation.
\item Only \acp{PE} in the team may call the routine. If a
\ac{PE} not in the team calls a team-based
collective routine, the behavior is undefined.
\item If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.
\item \ac{PE} numbering is relative to the team. The specified
root \ac{PE} must be a valid \ac{PE} number for the team,
between \CONST{0} and \VAR{N$-$1}, where \VAR{N} is the size of
the team.
\end{itemize}

Before any \ac{PE} calls a broadcast routine, the following conditions
must be ensured, otherwise the behavior is undefined:
\begin{itemize}
\item The \dest{} array on all \acp{PE} in the team is ready to
accept the result of the operation.
\item The \source{} array at the local root \ac{PE} is ready to be
read by any \ac{PE} in the team.
\end{itemize}
The application does not need to synchronize to ensure that the \source{}
array is ready across all \acp{PE} prior to calling this routine.

Upon return from a team-based broadcast routine, the following are true for the local
\ac{PE}:
\begin{itemize}
\item The \dest{} data object is updated.
\item The \source{} data object may be safely reused.
\end{itemize}

\begin{DeprecateBlock}
\openshmem active-set broadcast routines are collective routines over an active set.
They copy the \source{} data object on the \ac{PE} specified by
\VAR{PE\_root} to the \dest{} data object on the \acp{PE}
participating in the collective operation.
The same \dest{} and \source{} data objects and the same value of
\VAR{PE\_root} must be passed by all \acp{PE} participating in the
collective operation.

For active-set-based broadcasts:
\begin{itemize}
\item The \dest{} object is updated on all \acp{PE} other than the
root \ac{PE}.
\item All \acp{PE} in the active set defined by the
\VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet
must participate in the operation.
\item Only \acp{PE} in the active set may call the routine. If a
\ac{PE} not in the active set calls an active-set-based
\item The \VAR{dest} object is updated on all PEs other than the root PE.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\item The \VAR{dest} object is updated on all PEs other than the root PE.
\item The \VAR{dest} object is updated on all \acp{PE} other than the root PE.

\item All \acp{PE} in the active set defined by the
\VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet
must participate in the operation.
\item Only \acp{PE} in the active set may call the routine. If a
\ac{PE} not in the active set calls an active-set-based
collective routine, the behavior is undefined.
\item The values of arguments \VAR{PE\_root}, \VAR{PE\_start},
\item The values of arguments \VAR{PE\_root}, \VAR{PE\_start},
\VAR{logPE\_stride}, and \VAR{PE\_size} must be the same value
on all \acp{PE} in the active set.
\item The value of \VAR{PE\_root} must be between \CONST{0} and
\item The value of \VAR{PE\_root} must be between \CONST{0} and
\VAR{PE\_size $-$ 1}.
\item The same \VAR{pSync} work array must be passed by all \acp{PE}
\item The same \VAR{pSync} work array must be passed by all \acp{PE}
in the active set.
\end{itemize}

Before any \ac{PE} calls a broadcast routine, the following
Before any \ac{PE} calls a active-set-based broadcast routine, the following
conditions must be ensured:
\begin{itemize}
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\item For active-set-based broadcasts, the
\VAR{pSync} array on all \acp{PE} in the
active set is not still in use from a prior call to an \openshmem
collective routine.
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\item The \VAR{pSync} array on all \acp{PE} in the
active set is not still in use from a prior call to an \openshmem
collective routine.
\end{itemize}
Otherwise, the behavior is undefined.

Upon return from a broadcast routine, the following are true for the local
Upon return from an active-based broadcast routine, the following are true for the local
\ac{PE}:
\begin{itemize}
\item For team-based broadcasts, the \dest{} data object is
updated.
\item For active-set-based broadcasts:
\begin{itemize}
\item If the current \ac{PE} is not the root \ac{PE}, the
\dest{} data object is updated.
\item If the current PE is not the root PE, the \dest{} data object is updated.
\item The \source{} data object may be safely reused.
\item The values in the \VAR{pSync} array are restored to the
original values.
\end{itemize}
\item The \source{} data object may be safely reused.
\end{itemize}
\end{DeprecateBlock}
}


\apireturnvalues{
For team-based broadcasts, zero on successful local completion; otherwise, nonzero.

\begin{DeprecateBlock}
For active-set-based broadcasts, none.
\end{DeprecateBlock}

}

\apinotes{
Expand Down
48 changes: 42 additions & 6 deletions content/shmem_collect.tex
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,13 @@
\openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective
operation to concatenate \VAR{nelems}
data items from the \source{} array into the
\dest{} array, over an \openshmem team or active set
in processor number order. The resultant \dest{} array contains the contribution from
\dest{} array, over an \openshmem team in processor number order.
Copy link
Collaborator Author

@davidozog davidozog Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\dest{} array, over an \openshmem team in processor number order.
\dest{} array, over an \openshmem team in \ac{PE} number order.

The resultant \dest{} array contains the contribution from
\acp{PE} as follows:

\begin{itemize}
\item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the
contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on.
\item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the
contribution from \ac{PE} \CONST{1} in the team, and so on.
\item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the
contribution from \ac{PE} \CONST{1} in the team, and so on.
\end{itemize}

The collected result is written to the \dest{} array for all \acp{PE}
Expand All @@ -90,6 +88,37 @@
If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.

Before any \ac{PE} calls a collect routine, the following conditions must
be ensured, otherwise the behavior is undefined:
\begin{itemize}
\item The \dest{} array on all \acp{PE} in the team is ready to
accept the result of the operation.
\item The \source{} array at the local \ac{PE} is ready to be read
by any \ac{PE} in the team.
\end{itemize}
The application does not need to synchronize to ensure that the \source{}
array is ready across all \acp{PE} prior to calling this routine.

\begin{DeprecateBlock}
\openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective
operation to concatenate \VAR{nelems}
data items from the \source{} array into the
\dest{} array, over an \openshmem active set
in processor number order. The resultant \dest{} array contains the contribution from
\acp{PE} as follows:
\begin{itemize}
\item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the
contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on.
\end{itemize}

The collected result is written to the \dest{} array for all \acp{PE}
that participate in the operation. The same \dest{} and \source{}
arrays must be passed by all \acp{PE} that participate in the operation.

The \FUNC{fcollect} routines require that \VAR{nelems} be the same value in all
participating \acp{PE}, while the \FUNC{collect} routines allow \VAR{nelems} to
vary from \ac{PE} to \ac{PE}.

Active-set-based collective routines operate over all \acp{PE} in the active set
defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet.
As with all active-set-based collective routines,
Expand All @@ -108,16 +137,23 @@
\item For active-set-based collective routines, the values in the \VAR{pSync} array are
restored to the original values.
\end{itemize}
\end{DeprecateBlock}
}

\apireturnvalues{
Zero on successful local completion. Nonzero otherwise.
}

\apinotes{
\begin{DeprecateBlock}
The collective routines operate on active \ac{PE} sets that have a
non-power-of-two \VAR{PE\_size} with some performance degradation. They operate
with no performance degradation when \VAR{nelems} is a non-power-of-two value.
\end{DeprecateBlock}
The collective routines that operate on teams containing a
non-power-of-two of PEs do so with some performance degradation. They operate
with no performance degradation when \VAR{nelems} is a non-power-of-two value.

}

\begin{apiexamples}
Expand Down
Loading