Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post_soft_action_delay not actually used #106

Open
gojkoz opened this issue Oct 13, 2020 · 6 comments
Open

post_soft_action_delay not actually used #106

gojkoz opened this issue Oct 13, 2020 · 6 comments

Comments

@gojkoz
Copy link

gojkoz commented Oct 13, 2020

I cannot see that post_soft_action_delay is actually being used to sleep after a soft corrective action.
Reviewed the code and only place I can find is here:

nohang/src/nohang

Line 2710 in cf6b213

sleep(post_soft_action_delay)

Might be indenting problem, maybe this was intended to be placed outside of the except block?

@hakavlad
Copy link
Owner

Good catch, thanks.

Seems like post_soft_action_delay is not work as as intended, but it is not a big problem.

I will think about how to fix this later.

         sleep(post_soft_action_delay) 

-- this line may be safety removed. This is not how it should work.

maybe this was intended to be placed outside of the except block?

No, the problem should be solved differently.

@gojkoz
Copy link
Author

gojkoz commented Oct 14, 2020

Thanks for confirming. I am going to patch my local copy for now but likely not in a way others will want to use it. For the specific use case I'm trying I just need it to not do anything for some time, after it's implemented one soft corrective action. I want to block further checking for a defined period - prevent it from reacting too often on a runaway process that's re-spawning. Server use case. If you can cover that as an option in a future release, that would be great.

Good tool otherwise, thanks for making it.

@hakavlad
Copy link
Owner

hakavlad commented Oct 14, 2020

Do you have any problem with the current version?

to not do anything for some time, after it's implemented one soft corrective action

Nohang does nothing after implementing soft corrective action, it waits until the victim dies. If the victim does not respond on soft corrective action, nohang sends SIGKILL (after max_soft_exit_time, 10s by default).

@hakavlad
Copy link
Owner

hakavlad commented Oct 14, 2020

In fact, after soft corrective action, if victim_cache_time = 10 and max_soft_exit_time = 10 (deafault values) nohang will do nothing 10s if hard threshold is not exceeded (it will spam during this time - #69).

@gojkoz
Copy link
Author

gojkoz commented Oct 14, 2020

You are right, I didn't know about victim_cache_time. That's what I need actually, to not touch the same victim too often even if it's to blame. That might be the easiest solution for what I need, I was trying to do it with the post_soft_action_delay at first (delay doing anything again after an action).

@hakavlad
Copy link
Owner

not touch the same victim too often

nohang sends SIGTERM to one process only once. The next signal is SIGKILL after soft_exit_time, if the process does not respond on SIGKILL.

victim_cache_time means the time during which nohang should not look for a new victim if the old victim is alive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants