Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UB in cancelled tasks again #579

Merged
merged 1 commit into from
Oct 27, 2023
Merged

Conversation

ispeters
Copy link
Contributor

As reported in #577, #578 didn't fix the UB in cancelled tasks. There were two bugs in that "fix":

  1. we were writing the coroutine handle to resume after synchronizing between the two racing operation states, leading to a data race when the deferred stop source completed second; and
  2. it's wrong to call continuation_.done() to store the coroutine handle to be resumed later because done() completes the operation with set_done() as a side effect.

This diff fixes both issues. Instead of storing a coroutine handle to be resumed, we store an enum value that describes which coroutine handle to resume (so we don't eagerly invoke done()), and we do the store before synchronizing to eliminate the data race.

I've added a unit test that fails with a TSAN-detected data race without the fix, and the fix silences the TSAN error.

I also clang-formatted task.hpp and the modified test file.

As reported in #577, #578 didn't fix the UB in cancelled tasks. There
were two bugs in that "fix":
 1. we were writing the coroutine handle to resume *after*
    synchronizing between the two racing operation states, leading to a
    data race when the deferred stop source completed second; and
 2. it's wrong to call `continuation_.done()` to store the coroutine
    handle to be resumed later because `done()` completes the operation
    with `set_done()` as a side effect.

This diff fixes both issues. Instead of storing a coroutine handle to be
resumed, we store an enum value that describes which coroutine handle to
resume (so we don't eagerly invoke `done()`), and we do the store before
synchronizing to eliminate the data race.

I've added a unit test that fails with a TSAN-detected data race without
the fix, and the fix silences the TSAN error.

I also clang-formatted `task.hpp` and the modified test file.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 26, 2023
@ispeters
Copy link
Contributor Author

@glebov-andrey, do you want to patch this in and see if it fixes your issue?

@ispeters ispeters requested a review from janondrusek October 26, 2023 23:02
@glebov-andrey
Copy link

Looks good, thanks!
All our tests pass and I haven't been able to cause any more crashes during cancellation with this patch and ASAN, UBSAN or TSAN.

@ispeters ispeters merged commit 288c425 into main Oct 27, 2023
93 checks passed
@ispeters ispeters deleted the fix_ub_in_cancelled_tasks_again branch October 27, 2023 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants