Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need new algorithm for handling mixed async/sync timer stacks #184

Open
7 of 13 tasks
khuck opened this issue Jan 8, 2025 · 0 comments
Open
7 of 13 tasks

Need new algorithm for handling mixed async/sync timer stacks #184

khuck opened this issue Jan 8, 2025 · 0 comments

Comments

@khuck
Copy link
Collaborator

khuck commented Jan 8, 2025

The current method for tracking task hierarchies and timer stacks is unstable. We need a new approach. The goal is to handle situations where an asynchronous task has synchronous children (i.e. a PaRSEC task makes CUDA function calls). In that case, we want to maintain a timer stack on the current OS thread to maintain task dependencies when unspecified. However, the timer stack can't live with the current thread, because we have a degenerate case where a task is started on one OS thread and stopped on another OS thread. Ouch.

The goal is to separate the task dependency (task creation, starting) maintenance from the timer stack (start/yield/resume/stop). We can't assume that a task that is constructed without an explicit parent is a synchronous task (or can we?). But that is the 99% case.

When a task is constructed:

  • If parent(s) are passed in, construct task with supplied parents.
  • if parent == nullptr:
    • flag task as implicit parent task
    • if we have a top level timer on this thread, use it as the parent
    • otherwise, use APEX_MAIN as parent

When a task is started:

  • if this thread has a currently running timer (and the thread that constructed the task is the same one that is starting the task, the newly started task has an implicit parent), save it in the newly started task (otherwise nullptr)
  • save the newly started task as the currently running timer for this thread

When a task is yielded:

  • if the timer yielded is the "top level" timer on this thread, just yield it and restore the next timer on the "stack" for this thread
  • if the timer yielded is not the "top level" timer on this thread, walk the "stack" to find it. if found, yield all intermediate timers. This is equivalent to an HPX direct action that has its parent task yielded by the runtime.
  • if the timer yielded is not found, this task was started elsewhere (another OS thread). Issue an assertion failure, but also just yield the current timer and clear the timer stack on this thread.

When a task is resumed:

  • if this thread has a currently running timer, save it in the newly started task (otherwise nullptr)
  • save the newly started task as the currently running timer for this thread
  • resume all intermediate timers associated with this task that were yielded when it was yielded (if necessary).

When an task is stopped:

  • if the timer stopped is the "top level" timer on this thread, just stop it and restore the next timer on the "stack" for this thread
  • if the timer stopped is not the "top level" timer on this thread, walk the "stack" to find it. if found, issue an assertion failure/warning, stop all intermediate timers.
  • if the timer stopped is not found, this task was started elsewhere (another OS thread). Issue an assertion failure, but also just stop the current timer and clear the timer stack on this thread.

We will likely still find issues. But I am trying to keep APEX from crashing in PaRSEC...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant