Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0.2] SHiP: Fix hang on exit #845

Merged
merged 3 commits into from
Oct 1, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions plugins/state_history_plugin/state_history_plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,16 @@ void state_history_plugin_impl::plugin_shutdown() {
accepted_block_connection.reset();
block_start_connection.reset();
thread_pool.stop();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the connections torn down, is there any risk pumping the main thread below might apply a block that ship would then not see?

Copy link
Member Author

@heifner heifner Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! It does seem like this could push a block into the controller that would be missed by SHiP.

Copy link
Member Author

@heifner heifner Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no I don't think that is possible. Everything that uses app().executor().post when they execute they are only pushed into the priority_queue to execute. So probably should clear that out again here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we even need those connections to be manually torn down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, but that can be cleaned up when we work the real solution.

// This is a temporary fix until https://github.com/AntelopeIO/spring/issues/842 can be worked.
// Drain the io_service of anything that could be referencing state_history_plugin
if (app().executor().get_io_service().stopped()) {
app().executor().get_io_service().restart();
while (app().executor().get_io_service().poll())
;
// clear priority queue of anything pushed by poll(), see application_base exec()
app().executor().clear();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a SIGINT etc here will interrupt it. Probably fine.

I wonder if there is a way to add some sort of sentinel to know that "anything that could be referencing state_history_plugin" has actually been drained, so we don't somehow get in a state where we loop forever here because something else unrelated is post()ing forever. I'm not immediately coming up with a good solution though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just not do the poll() call in a loop. We are not expecting anything to be posting at this point.

Copy link
Member Author

@heifner heifner Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting enough, the poll() has to be in a loop. I guess the co_spawn must be posting additional as it drains. I tried with just one call to poll(), and it didn't work.

}

void state_history_plugin::plugin_shutdown() {
Expand Down