Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctrl-C hangs with Server did not exit, forcefully killing. #90

Open
bersace opened this issue Jul 17, 2024 · 3 comments
Open

Ctrl-C hangs with Server did not exit, forcefully killing. #90

bersace opened this issue Jul 17, 2024 · 3 comments

Comments

@bersace
Copy link

bersace commented Jul 17, 2024

Hi,

Thanks for hupper, it's awesome !

I'm hitting a weird behaviour, similar to #47 . I'm using hupper in https://github.com/dalibo/temboard, using hupper API: https://github.com/dalibo/temboard/blob/c35895bf79a65758270de6fa5ea73869f17cd82f/agent/temboardagent/cli/app.py#L196-L210

In a debian container, Python python3.9, I enable hupper like this:

root@d303b92213a0:/var/lib/temboard-agent# sudo -Eu postgres DEBUG=y python3 -m  temboardagent
13:29:05 temboardagent[1239] INFO:  app: Starting temboard-agent 9.0.dev0.
13:29:05 temboardagent[1239] DEBUG:  taskmanager: Register worker discover
...
Starting monitor for PID 1243.
...
13:29:05 temboardagent[1243] INFO:  app: Starting temboard-agent 9.0.dev0.
...

Hot reloading works well. Here is the logs when I hit Ctrl-C:

^CReceived SIGINT, waiting for server to exit ...
13:29:08 temboardagent[1243] INFO:  services: Interrupted.
13:29:08 temboardagent[1257] INFO:  services: Interrupted.
13:29:08 temboardagent[1258] INFO:  services: Interrupted.
13:29:08 temboardagent[1258] INFO:  taskmanager: Aborting jobs.
13:29:08 temboardagent[1258] DEBUG:  services: Done. service=worker pool
13:29:08 temboardagent[1243] DEBUG:  services: Terminating background service. service=scheduler pid=1257
13:29:08 temboardagent[1243] DEBUG:  services: Terminating background service. service=worker pool pid=1258
13:29:09 temboardagent[1243] DEBUG:  services: Waiting background services.
13:29:09 temboardagent[1257] DEBUG:  services: Done. service=scheduler
Server did not exit, forcefully killing.

Then hupper hangs forever. I need to type Ctrl-C a second time.
I tried to call graceful_shutdown() from a sigint_handler without success.

Do you have a clue on something wrong I do ?

Regards,
Étienne

@mmerickel
Copy link
Member

It appears that your child process is receiving the SIGINT, shutting down, but not actually exiting. Thus hupper is then killing it after shutdown_interval time. Can you tell if the child's PID is gone after the "services: Done. service=scheduler" message, prior to the "Server did not exit, forcefully killing." message?

@bersace
Copy link
Author

bersace commented Jul 17, 2024

It appears that your child process is receiving the SIGINT, shutting down, but not actually exiting. Thus hupper is then killing it after shutdown_interval time. Can you tell if the child's PID is gone after the "services: Done. service=scheduler" message, prior to the "Server did not exit, forcefully killing." message?

I'm pretty sure of that. Before SIGINT:

$ ps -efH
root          60       0  0 19:55 pts/0    00:00:00 /bin/bash
root         440      60  0 19:57 pts/0    00:00:00   sudo -Eu postgres DEBUG=y python3 -m temboardagent
postgres     441     440  6 19:57 pts/0    00:00:00     python3 -m temboardagent
postgres     445     441  6 19:57 pts/0    00:00:00       temboard-agent: postgres1: web
postgres     459     445  0 19:57 pts/0    00:00:00         temboard-agent: postgres1: scheduler
postgres     460     445  0 19:57 pts/0    00:00:00         temboard-agent: postgres1: worker pool
postgres     489     460  0 19:57 pts/0    00:00:00           temboard-agent: postgres1: task temboardagent.plugins.dashboard.dashboard_collect

temboard-agent: postgres1: web is the main process of the project.

After Ctrl-C in terminal:

$ ps -efH
root         440      60  0 19:57 pts/0    00:00:00   sudo -Eu postgres DEBUG=y python3 -m temboardagent
postgres     441     440  0 19:57 pts/0    00:00:00     python3 -m temboardagent

Regards,
Étienne

@mmerickel
Copy link
Member

mmerickel commented Jul 18, 2024

That's curious, there's no magic to what hupper is doing there.

So to unpack, there's 2 potential issues:

  1. There's the graceful shutdown that isn't finishing successfully so you're ending up with a forceful kill.
  2. And then the forceful kill is hanging.

For part 1, hupper is basically hanging onto a subprocess.Popen object and invoking .wait() and .poll() to determine if the child is dead. If it isn't dead after shutdown_interval time then it invokes a .kill() on that popen object.

For part 2, it sounds like you are telling me that the .kill() is hanging.

Does this sound right? I'm at a loss why, if you claim the PID is gone, that wait() would think it is still alive in the first place. And afterward, since it thinks it's still alive it tries a kill - I'm also at a loss why the kill would hang. But first, why does it think it's alive at all?

The main interesting thing that comes to mind is something about the fact that you're running a whole process tree under the reloader, not just a single python process. I wonder if that causes issues with the process group in some way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants