-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid deadlock in syncer.Close #92
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -285,14 +285,19 @@ func (s *Syncer) runPeer(p *Peer) error { | |
p.setErr(err) | ||
return fmt.Errorf("failed to accept rpc: %w", err) | ||
} | ||
|
||
// set a generous deadline | ||
err = stream.SetDeadline(time.Now().Add(10 * time.Minute)) | ||
if err != nil { | ||
p.setErr(err) | ||
return fmt.Errorf("failed to set deadline: %w", err) | ||
} | ||
|
||
inflight <- struct{}{} | ||
s.wg.Add(1) | ||
go func() { | ||
defer s.wg.Done() | ||
defer stream.Close() | ||
// NOTE: we do not set any deadlines on the stream. If a peer is | ||
// slow, fine; we don't need to worry about resource exhaustion | ||
// unless we have tons of peers. | ||
if err := s.handleRPC(id, stream, p); err != nil { | ||
s.log.Debug("rpc failed", zap.Stringer("peer", p), zap.Stringer("rpc", id), zap.Error(err)) | ||
} | ||
|
@@ -679,10 +684,22 @@ func (s *Syncer) Run(ctx context.Context) error { | |
return err | ||
} | ||
|
||
// Close closes the Syncer's net.Listener. | ||
func (s *Syncer) Close() error { | ||
// Shutdown closes the Syncer's net.Listener. | ||
func (s *Syncer) Shutdown(ctx context.Context) error { | ||
err := s.l.Close() | ||
s.wg.Wait() | ||
|
||
waitChan := make(chan struct{}) | ||
go func() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar thing here. This shouldn't really be necessary. If we run into this, we still got deadlock issues. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitely agree on the deadlock still being an issue, I wasn't aware the stream would automatically unblock if the listener got closed. That aside though, I always do the |
||
s.wg.Wait() | ||
close(waitChan) | ||
}() | ||
|
||
select { | ||
case <-ctx.Done(): | ||
return errors.Join(err, ctx.Err()) | ||
case <-waitChan: | ||
} | ||
|
||
return err | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a fix I'm afraid. If we ever run into this on shutdown, the deadlock still exists. We should figure out what is causing the actual deadlock since the stream should unblock after the listener is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than this line I couldn't really identify another spot where it might potentially deadlock. @lukechampine do you mind taking a look? I commented on the PR with a link to the stack trace. I figured the stream could live on even after the listener got closed.