-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid deadlock in syncer.Close #92
Conversation
not assigning Luke because he's OOO |
@@ -285,14 +285,19 @@ func (s *Syncer) runPeer(p *Peer) error { | |||
p.setErr(err) | |||
return fmt.Errorf("failed to accept rpc: %w", err) | |||
} | |||
|
|||
// set a generous deadline | |||
err = stream.SetDeadline(time.Now().Add(10 * time.Minute)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a fix I'm afraid. If we ever run into this on shutdown, the deadlock still exists. We should figure out what is causing the actual deadlock since the stream should unblock after the listener is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than this line I couldn't really identify another spot where it might potentially deadlock. @lukechampine do you mind taking a look? I commented on the PR with a link to the stack trace. I figured the stream could live on even after the listener got closed.
s.wg.Wait() | ||
|
||
waitChan := make(chan struct{}) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar thing here. This shouldn't really be necessary. If we run into this, we still got deadlock issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree on the deadlock still being an issue, I wasn't aware the stream would automatically unblock if the listener got closed. That aside though, I always do the waitChan
pattern to avoid a naked s.wg.Wait
and consider it good practice?
Full stack trace |
…nto pj/syncer-shutdown
@lukechampine I closed this out for now because I can't seem to reproduce the deadlock on the CI, have never encountered it locally. I used to see it once every other day in the wild but I haven't seen it for some time now. Perhaps we should consider switching strategies on this one and transform this PR into doing the small |
Obviously it happens again two days after closing it https://github.com/SiaFoundation/renterd/actions/runs/11030847304/job/30636343643?pr=1574 |
I ran into the following deadlock:
For context, we changed the behaviour here a couple of times:
I think we should rename close to
Shutdown
and accept a context if we intend to wait for all goroutines to exit. That still leaves the fact that we deadlocked. Looking at all the spots, the only potential culprit I could identify wasrunPeer
where we don't set a deadline onhandleRPC
. I considered setting a 5' timeout but landed on 10' to be generous.