Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deadlock in syncer.Close #92

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 23 additions & 6 deletions syncer/syncer.go
Original file line number Diff line number Diff line change
Expand Up @@ -285,14 +285,19 @@ func (s *Syncer) runPeer(p *Peer) error {
p.setErr(err)
return fmt.Errorf("failed to accept rpc: %w", err)
}

// set a generous deadline
err = stream.SetDeadline(time.Now().Add(10 * time.Minute))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a fix I'm afraid. If we ever run into this on shutdown, the deadlock still exists. We should figure out what is causing the actual deadlock since the stream should unblock after the listener is closed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than this line I couldn't really identify another spot where it might potentially deadlock. @lukechampine do you mind taking a look? I commented on the PR with a link to the stack trace. I figured the stream could live on even after the listener got closed.

if err != nil {
p.setErr(err)
return fmt.Errorf("failed to set deadline: %w", err)
}

inflight <- struct{}{}
s.wg.Add(1)
go func() {
defer s.wg.Done()
defer stream.Close()
// NOTE: we do not set any deadlines on the stream. If a peer is
// slow, fine; we don't need to worry about resource exhaustion
// unless we have tons of peers.
if err := s.handleRPC(id, stream, p); err != nil {
s.log.Debug("rpc failed", zap.Stringer("peer", p), zap.Stringer("rpc", id), zap.Error(err))
}
Expand Down Expand Up @@ -679,10 +684,22 @@ func (s *Syncer) Run(ctx context.Context) error {
return err
}

// Close closes the Syncer's net.Listener.
func (s *Syncer) Close() error {
// Shutdown closes the Syncer's net.Listener.
func (s *Syncer) Shutdown(ctx context.Context) error {
err := s.l.Close()
s.wg.Wait()

waitChan := make(chan struct{})
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thing here. This shouldn't really be necessary. If we run into this, we still got deadlock issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree on the deadlock still being an issue, I wasn't aware the stream would automatically unblock if the listener got closed. That aside though, I always do the waitChan pattern to avoid a naked s.wg.Wait and consider it good practice?

s.wg.Wait()
close(waitChan)
}()

select {
case <-ctx.Done():
return errors.Join(err, ctx.Err())
case <-waitChan:
}

return err
}

Expand Down
4 changes: 2 additions & 2 deletions syncer/syncer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,15 +47,15 @@ func TestSyncer(t *testing.T) {
UniqueID: gateway.GenerateUniqueID(),
NetAddress: l1.Addr().String(),
}, syncer.WithLogger(log.Named("syncer1")))
defer s1.Close()
defer s1.Shutdown(context.Background())
go s1.Run(context.Background())

s2 := syncer.New(l2, cm2, testutil.NewMemPeerStore(), gateway.Header{
GenesisID: genesis.ID(),
UniqueID: gateway.GenerateUniqueID(),
NetAddress: l2.Addr().String(),
}, syncer.WithLogger(log.Named("syncer2")), syncer.WithSyncInterval(10*time.Millisecond))
defer s2.Close()
defer s2.Shutdown(context.Background())
go s2.Run(context.Background())

// mine a few blocks on cm1
Expand Down
Loading