You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current method for tracking task hierarchies and timer stacks is unstable. We need a new approach. The goal is to handle situations where an asynchronous task has synchronous children (i.e. a PaRSEC task makes CUDA function calls). In that case, we want to maintain a timer stack on the current OS thread to maintain task dependencies when unspecified. However, the timer stack can't live with the current thread, because we have a degenerate case where a task is started on one OS thread and stopped on another OS thread. Ouch.
The goal is to separate the task dependency (task creation, starting) maintenance from the timer stack (start/yield/resume/stop). We can't assume that a task that is constructed without an explicit parent is a synchronous task (or can we?). But that is the 99% case.
When a task is constructed:
If parent(s) are passed in, construct task with supplied parents.
if parent == nullptr:
flag task as implicit parent task
if we have a top level timer on this thread, use it as the parent
otherwise, use APEX_MAIN as parent
When a task is started:
if this thread has a currently running timer (and the thread that constructed the task is the same one that is starting the task, the newly started task has an implicit parent), save it in the newly started task (otherwise nullptr)
save the newly started task as the currently running timer for this thread
When a task is yielded:
if the timer yielded is the "top level" timer on this thread, just yield it and restore the next timer on the "stack" for this thread
if the timer yielded is not the "top level" timer on this thread, walk the "stack" to find it. if found, yield all intermediate timers. This is equivalent to an HPX direct action that has its parent task yielded by the runtime.
if the timer yielded is not found, this task was started elsewhere (another OS thread). Issue an assertion failure, but also just yield the current timer and clear the timer stack on this thread.
When a task is resumed:
if this thread has a currently running timer, save it in the newly started task (otherwise nullptr)
save the newly started task as the currently running timer for this thread
resume all intermediate timers associated with this task that were yielded when it was yielded (if necessary).
When an task is stopped:
if the timer stopped is the "top level" timer on this thread, just stop it and restore the next timer on the "stack" for this thread
if the timer stopped is not the "top level" timer on this thread, walk the "stack" to find it. if found, issue an assertion failure/warning, stop all intermediate timers.
if the timer stopped is not found, this task was started elsewhere (another OS thread). Issue an assertion failure, but also just stop the current timer and clear the timer stack on this thread.
We will likely still find issues. But I am trying to keep APEX from crashing in PaRSEC...
The text was updated successfully, but these errors were encountered:
The current method for tracking task hierarchies and timer stacks is unstable. We need a new approach. The goal is to handle situations where an asynchronous task has synchronous children (i.e. a PaRSEC task makes CUDA function calls). In that case, we want to maintain a timer stack on the current OS thread to maintain task dependencies when unspecified. However, the timer stack can't live with the current thread, because we have a degenerate case where a task is started on one OS thread and stopped on another OS thread. Ouch.
The goal is to separate the task dependency (task creation, starting) maintenance from the timer stack (start/yield/resume/stop). We can't assume that a task that is constructed without an explicit parent is a synchronous task (or can we?). But that is the 99% case.
When a task is constructed:
When a task is started:
When a task is yielded:
When a task is resumed:
When an task is stopped:
We will likely still find issues. But I am trying to keep APEX from crashing in PaRSEC...
The text was updated successfully, but these errors were encountered: