Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roscpp multithreaded spinners eat up CPU when callbacks take too long #2341

Open
Zob314 opened this issue Aug 10, 2023 · 2 comments · May be fixed by #2377
Open

roscpp multithreaded spinners eat up CPU when callbacks take too long #2341

Zob314 opened this issue Aug 10, 2023 · 2 comments · May be fixed by #2377

Comments

@Zob314
Copy link

Zob314 commented Aug 10, 2023

Issue #1545 was fixed in ROS melodic, but not noetic.

The fix for the issue took a rather meandering rout to getting in, but PR #1684 and PR #2014 both seem related.

As far as I can tell, the offending functions in noetic are the same as in melodic pre fix, so this might be as easy as cherry picking those changes over.

@meyerj
Copy link
Contributor

meyerj commented Oct 7, 2024

👍

I was the original author of cwecht#1, a follow-up on @cwecht's #1684, that got cerry-picked and merged into melodic-devel in #2014.

The main difference between the two versions is that if the callback queue is not empty, but none of the callbacks is ready, the current version on noetic-devel without that patch returns TryAgain immediately without waiting, even if the timeout argument is non-zero. And that ultimately leads to the original issue #1545. The updated version also waits in case of a non-empty queue, and only returns TryAgain afterwards if and only if the wait call timed out.

The problem occurs in the following situation:

  • Multiple spinner threads for that CallbackQueue instance.
  • One thread is busy with executing a callback
  • There is no other work for at least one of the threads.
  • allow_concurrent_callbacks for that callback is false (the default).
  • Callbacks are enqueued too fast, such that the last callback is not yet finished before the next one is enqueued.

As a result, the callback queue (callbacks_) is not empty and hence the call to callOne() does not block, but none of the calls is actually ready and the attempt to actually call it results in a TryAgain response, such that it is enqueued again etc., and all normally idle threads will do that concurrently without sleeping in between.

There is another minor difference between melodic-devel and noetic-devel introduced in #1684, namely that SubscriptionQueue::ready() does not return true unconditionally in melodic-devel, but only if SubscriptionQueue::call() would not return CallbackInterface::TryAgain later. So no need to take the callback from the queue, just to then learn that it cannot be executed yet and then be pushed to the end of the queue again. That also may cause message reordering if multiple subscribers share the same CallbackQueue instance.

Are you interested in yet another PR to forward-port the relevant commits to noetic-devel?

@meyerj
Copy link
Contributor

meyerj commented Oct 7, 2024

Are you interested in yet another PR to forward-port the relevant commits to noetic-devel?

Done: #2377

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants