[1.0.2] SHiP: Fix hang on exit #845

heifner · 2024-09-30T21:58:46Z

This is a temporary fix until issue #842 can be worked. Some alternatives were attempted, but no simple fix could be found that didn't require more work toward #842 then desired for 1.0.x. See AntelopeIO/appbase#34

Test runs of a hacked up state_history_plugin with sleeps that caused it to hang on almost every CI/CD run.
Example failures: https://github.com/AntelopeIO/spring/actions/runs/11076070901
With this fix: https://github.com/AntelopeIO/spring/actions/runs/11124229769

Resolves #822

…raction of app io_context and plugin threads.

spoonincode · 2024-09-30T22:21:07Z

plugins/state_history_plugin/state_history_plugin.cpp

+      app().executor().get_io_service().restart();
+      while (app().executor().get_io_service().poll())
+         ;
+   }


It looks like a SIGINT etc here will interrupt it. Probably fine.

I wonder if there is a way to add some sort of sentinel to know that "anything that could be referencing state_history_plugin" has actually been drained, so we don't somehow get in a state where we loop forever here because something else unrelated is post()ing forever. I'm not immediately coming up with a good solution though.

Maybe just not do the poll() call in a loop. We are not expecting anything to be posting at this point.

Interesting enough, the poll() has to be in a loop. I guess the co_spawn must be posting additional as it drains. I tried with just one call to poll(), and it didn't work.

spoonincode · 2024-09-30T22:22:16Z

plugins/state_history_plugin/state_history_plugin.cpp

@@ -380,6 +380,14 @@ void state_history_plugin_impl::plugin_shutdown() {
   accepted_block_connection.reset();
   block_start_connection.reset();
   thread_pool.stop();
+


With the connections torn down, is there any risk pumping the main thread below might apply a block that ship would then not see?

Good call! It does seem like this could push a block into the controller that would be missed by SHiP.

Actually, no I don't think that is possible. Everything that uses app().executor().post when they execute they are only pushed into the priority_queue to execute. So probably should clear that out again here.

I wonder if we even need those connections to be manually torn down

Probably not, but that can be cleaned up when we work the real solution.

ericpassmore · 2024-10-01T22:26:21Z

Note:start
category: Other
component: SHiP
summary: Fix hang on exit, draining references to SHiP found in io_service.
Note:end

Temporary fix for 1.0.x until we can get a proper fix in for the inte…

6a3980b

…raction of app io_context and plugin threads.

heifner requested review from spoonincode and greg7mdp September 30, 2024 21:58

heifner added the OCI Work exclusive to OCI team label Sep 30, 2024

heifner linked an issue Sep 30, 2024 that may be closed by this pull request

Test failure: ship_kill_client_test #822

Closed

heifner changed the title ~~[1.0.2] SHiP: Fix hand on exit~~ [1.0.2] SHiP: Fix hang on exit Sep 30, 2024

spoonincode reviewed Sep 30, 2024

View reviewed changes

greg7mdp approved these changes Oct 1, 2024

View reviewed changes

heifner added 2 commits October 1, 2024 07:17

GH-822 Add in clear of executor

a31949a

Merge branch 'release/1.0' into GH-822-ship-hang

f8abec4

greg7mdp approved these changes Oct 1, 2024

View reviewed changes

spoonincode approved these changes Oct 1, 2024

View reviewed changes

heifner merged commit 4648fba into release/1.0 Oct 1, 2024
36 checks passed

heifner deleted the GH-822-ship-hang branch October 1, 2024 19:25

heifner mentioned this pull request Oct 1, 2024

[1.0.2 -> main] SHiP: Fix hang on exit #852

Merged

ericpassmore added the bug The product is not working as was intended. label Oct 1, 2024

spoonincode mentioned this pull request Oct 8, 2024

Create socket with provided executor #909

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.0.2] SHiP: Fix hang on exit #845

[1.0.2] SHiP: Fix hang on exit #845

heifner commented Sep 30, 2024 •

edited

Loading

spoonincode Sep 30, 2024

heifner Sep 30, 2024

heifner Oct 1, 2024 •

edited

Loading

spoonincode Sep 30, 2024

heifner Sep 30, 2024 •

edited

Loading

heifner Sep 30, 2024 •

edited

Loading

spoonincode Oct 1, 2024

heifner Oct 1, 2024

ericpassmore commented Oct 1, 2024

[1.0.2] SHiP: Fix hang on exit #845

[1.0.2] SHiP: Fix hang on exit #845

Conversation

heifner commented Sep 30, 2024 • edited Loading

spoonincode Sep 30, 2024

Choose a reason for hiding this comment

heifner Sep 30, 2024

Choose a reason for hiding this comment

heifner Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

spoonincode Sep 30, 2024

Choose a reason for hiding this comment

heifner Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

heifner Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

spoonincode Oct 1, 2024

Choose a reason for hiding this comment

heifner Oct 1, 2024

Choose a reason for hiding this comment

ericpassmore commented Oct 1, 2024

heifner commented Sep 30, 2024 •

edited

Loading

heifner Oct 1, 2024 •

edited

Loading

heifner Sep 30, 2024 •

edited

Loading

heifner Sep 30, 2024 •

edited

Loading