Ctrl-C hangs with Server did not exit, forcefully killing. #90

bersace · 2024-07-17T13:41:03Z

Hi,

Thanks for hupper, it's awesome !

I'm hitting a weird behaviour, similar to #47 . I'm using hupper in https://github.com/dalibo/temboard, using hupper API: https://github.com/dalibo/temboard/blob/c35895bf79a65758270de6fa5ea73869f17cd82f/agent/temboardagent/cli/app.py#L196-L210

In a debian container, Python python3.9, I enable hupper like this:

root@d303b92213a0:/var/lib/temboard-agent# sudo -Eu postgres DEBUG=y python3 -m  temboardagent
13:29:05 temboardagent[1239] INFO:  app: Starting temboard-agent 9.0.dev0.
13:29:05 temboardagent[1239] DEBUG:  taskmanager: Register worker discover
...
Starting monitor for PID 1243.
...
13:29:05 temboardagent[1243] INFO:  app: Starting temboard-agent 9.0.dev0.
...

Hot reloading works well. Here is the logs when I hit Ctrl-C:

^CReceived SIGINT, waiting for server to exit ...
13:29:08 temboardagent[1243] INFO:  services: Interrupted.
13:29:08 temboardagent[1257] INFO:  services: Interrupted.
13:29:08 temboardagent[1258] INFO:  services: Interrupted.
13:29:08 temboardagent[1258] INFO:  taskmanager: Aborting jobs.
13:29:08 temboardagent[1258] DEBUG:  services: Done. service=worker pool
13:29:08 temboardagent[1243] DEBUG:  services: Terminating background service. service=scheduler pid=1257
13:29:08 temboardagent[1243] DEBUG:  services: Terminating background service. service=worker pool pid=1258
13:29:09 temboardagent[1243] DEBUG:  services: Waiting background services.
13:29:09 temboardagent[1257] DEBUG:  services: Done. service=scheduler
Server did not exit, forcefully killing.

Then hupper hangs forever. I need to type Ctrl-C a second time.
I tried to call graceful_shutdown() from a sigint_handler without success.

Do you have a clue on something wrong I do ?

Regards,
Étienne

The text was updated successfully, but these errors were encountered:

mmerickel · 2024-07-17T18:35:28Z

It appears that your child process is receiving the SIGINT, shutting down, but not actually exiting. Thus hupper is then killing it after shutdown_interval time. Can you tell if the child's PID is gone after the "services: Done. service=scheduler" message, prior to the "Server did not exit, forcefully killing." message?

bersace · 2024-07-17T19:59:37Z

It appears that your child process is receiving the SIGINT, shutting down, but not actually exiting. Thus hupper is then killing it after shutdown_interval time. Can you tell if the child's PID is gone after the "services: Done. service=scheduler" message, prior to the "Server did not exit, forcefully killing." message?

I'm pretty sure of that. Before SIGINT:

$ ps -efH
root          60       0  0 19:55 pts/0    00:00:00 /bin/bash
root         440      60  0 19:57 pts/0    00:00:00   sudo -Eu postgres DEBUG=y python3 -m temboardagent
postgres     441     440  6 19:57 pts/0    00:00:00     python3 -m temboardagent
postgres     445     441  6 19:57 pts/0    00:00:00       temboard-agent: postgres1: web
postgres     459     445  0 19:57 pts/0    00:00:00         temboard-agent: postgres1: scheduler
postgres     460     445  0 19:57 pts/0    00:00:00         temboard-agent: postgres1: worker pool
postgres     489     460  0 19:57 pts/0    00:00:00           temboard-agent: postgres1: task temboardagent.plugins.dashboard.dashboard_collect

temboard-agent: postgres1: web is the main process of the project.

After Ctrl-C in terminal:

$ ps -efH
root         440      60  0 19:57 pts/0    00:00:00   sudo -Eu postgres DEBUG=y python3 -m temboardagent
postgres     441     440  0 19:57 pts/0    00:00:00     python3 -m temboardagent

Regards,
Étienne

mmerickel · 2024-07-18T07:06:20Z

That's curious, there's no magic to what hupper is doing there.

So to unpack, there's 2 potential issues:

There's the graceful shutdown that isn't finishing successfully so you're ending up with a forceful kill.
And then the forceful kill is hanging.

For part 1, hupper is basically hanging onto a subprocess.Popen object and invoking .wait() and .poll() to determine if the child is dead. If it isn't dead after shutdown_interval time then it invokes a .kill() on that popen object.

For part 2, it sounds like you are telling me that the .kill() is hanging.

Does this sound right? I'm at a loss why, if you claim the PID is gone, that wait() would think it is still alive in the first place. And afterward, since it thinks it's still alive it tries a kill - I'm also at a loss why the kill would hang. But first, why does it think it's alive at all?

The main interesting thing that comes to mind is something about the fact that you're running a whole process tree under the reloader, not just a single python process. I wonder if that causes issues with the process group in some way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ctrl-C hangs with Server did not exit, forcefully killing. #90

Ctrl-C hangs with Server did not exit, forcefully killing. #90

bersace commented Jul 17, 2024

mmerickel commented Jul 17, 2024

bersace commented Jul 17, 2024

mmerickel commented Jul 18, 2024 •

edited

Loading

Ctrl-C hangs with Server did not exit, forcefully killing. #90

Ctrl-C hangs with Server did not exit, forcefully killing. #90

Comments

bersace commented Jul 17, 2024

mmerickel commented Jul 17, 2024

bersace commented Jul 17, 2024

mmerickel commented Jul 18, 2024 • edited Loading

mmerickel commented Jul 18, 2024 •

edited

Loading