-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.0.2] SHiP: Fix hang on exit #845
Conversation
…raction of app io_context and plugin threads.
app().executor().get_io_service().restart(); | ||
while (app().executor().get_io_service().poll()) | ||
; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a SIGINT etc here will interrupt it. Probably fine.
I wonder if there is a way to add some sort of sentinel to know that "anything that could be referencing state_history_plugin" has actually been drained, so we don't somehow get in a state where we loop forever here because something else unrelated is post()
ing forever. I'm not immediately coming up with a good solution though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just not do the poll()
call in a loop. We are not expecting anything to be posting at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting enough, the poll()
has to be in a loop. I guess the co_spawn
must be posting additional as it drains. I tried with just one call to poll()
, and it didn't work.
@@ -380,6 +380,14 @@ void state_history_plugin_impl::plugin_shutdown() { | |||
accepted_block_connection.reset(); | |||
block_start_connection.reset(); | |||
thread_pool.stop(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the connections torn down, is there any risk pumping the main thread below might apply a block that ship would then not see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! It does seem like this could push a block into the controller that would be missed by SHiP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no I don't think that is possible. Everything that uses app().executor().post
when they execute they are only pushed into the priority_queue to execute. So probably should clear that out again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we even need those connections to be manually torn down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, but that can be cleaned up when we work the real solution.
Note:start |
This is a temporary fix until issue #842 can be worked. Some alternatives were attempted, but no simple fix could be found that didn't require more work toward #842 then desired for 1.0.x. See AntelopeIO/appbase#34
Test runs of a hacked up state_history_plugin with sleeps that caused it to hang on almost every CI/CD run.
Example failures: https://github.com/AntelopeIO/spring/actions/runs/11076070901
With this fix: https://github.com/AntelopeIO/spring/actions/runs/11124229769
Resolves #822