-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent/handle panicked thread/actor #141
Comments
I think it would be good to recover in as many cases as possible, using whatever strategy makes the most sense. But if I understand correctly zstor continues to run after a panic on the main thread? Even if we maximized our chances of recovering, I think it's better to exit than to stay alive in a non functional state. Then at least the process manager can restart the process and work can continue. |
Yes, should be, but the
agree with this. |
I've checked the Actix supervisor https://docs.rs/actix/latest/actix/struct.Supervisor.html
So, i think the best way for now is
|
This sounds good to me. In the context of qsfs, I've also added a script to periodically check for success and retry any failed store operations. Zstor isn't expected to guarantee retries in the face of general failures, like sudden power loss, either. |
how you do this? by checking the logs periodically? |
You can see the approach here: https://github.com/threefoldtech/quantum-storage/blob/master/lib/retry-uploads.sh It's using the |
During development, sometimes i got this kind of message
thread 'main' panicked at zstor/src/actors/....
and then all the things will be failed with this error
Zstor error: error during waiting for async task completion: Mailbox has closed
Currently it is my code that still under development, but i remember that i got it before, which unfortunately i didn't check it deeper.
Considering Murphy's Law
Looks like we could improve it a bit by either prevent it or handle it.
unwrap
is one of the main source, and it is a lot in0-stor-v2
codeThe text was updated successfully, but these errors were encountered: