New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

pickfirst: Register a health listener when used as a leaf policy #7832

Open

arjan-bal wants to merge 9 commits into grpc:master from arjan-bal:pickfirst-health-listener

Contributor

arjan-bal commented Nov 13, 2024

As part of the dualstack changes described in A61, pickfirst will become the universal leaf policy. This PR provides a EnableHealthListener function for petiole policies to inform pickfirst when it's functioning as a leaf policy.

When functioning as a leaf policy, pickfirst will subscribe to health updates using the SubConn.RegisterHealthListener API introduced in #7780 once the SubConn connectivity state becomes READY. The health state will be used to update the ClientConn state as long as the SubConn's connectivity state remains READY.

RELEASE NOTES: N/A


          Register a health listener in the new pickfirst balancer

a1517c7

arjan-bal added Type: Feature Area: Resolvers/Balancers labels

arjan-bal added this to the 1.69 Release milestone

arjan-bal requested review from easwars and dfawley

November 13, 2024 07:34

arjan-bal assigned easwars and dfawley

codecov bot commented Nov 13, 2024 •

edited

Loading

Codecov Report

Attention: Patch coverage is 85.85859% with 14 lines in your changes missing coverage. Please review.

Project coverage is 82.09%. Comparing base (4c07bca) to head (d852260).

Files with missing lines	Patch %	Lines
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go	85.85%	10 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7832      +/-   ##
==========================================
+ Coverage   81.84%   82.09%   +0.24%     
==========================================
  Files         377      377              
  Lines       38120    38184      +64     
==========================================
+ Hits        31201    31346     +145     
+ Misses       5603     5541      -62     
+ Partials     1316     1297      -19

Files with missing lines	Coverage Δ
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go	`86.99% <85.85%> (-0.82%)`	⬇️

... and 25 files with indirect coverage changes

arjan-bal modified the milestone: 1.69 Release

easwars reviewed

View reviewed changes

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go Show resolved Hide resolved

easwars assigned arjan-bal and unassigned easwars


          Address review comments

53e7de9

arjan-bal assigned easwars and unassigned easwars


          Fix typo

a68c372

arjan-bal assigned easwars and unassigned arjan-bal

easwars reviewed

View reviewed changes

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go Show resolved Hide resolved

easwars assigned arjan-bal and unassigned easwars

dfawley reviewed

View reviewed changes

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated

Comment on lines 119 to 123

+              // EnableHealthListener updates the state to configure pickfirst for using a
+              // generic health listener.
+              func EnableHealthListener(attrs *attributes.Attributes) *attributes.Attributes {
+              	return attrs.WithValue(enableHealthListenerKeyType{}, enableHealthListenerValue)
+              }

Member

dfawley Nov 19, 2024

For these types of functions we prefer to make them operate on the thing that contains the attributes. In this case, that would be a resolver.State.

Contributor Author

arjan-bal Nov 20, 2024

Changed the function to accept resolver.State.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated

-              	mu    sync.Mutex
-              	state connectivity.State
+              	mu                sync.Mutex
+              	connectivityState connectivity.State

Member

dfawley Nov 19, 2024

I think this deserves a comment, too, since there are now two very similar fields.

Contributor Author

arjan-bal Nov 20, 2024

Added a comment.

Member

dfawley Nov 22, 2024

Shouldn't this kind of tracking be done in the subconn struct (scData) and not here? I would expect the lb policy only has the concludedState and each subchannel needs to track its real state and its effective state, accounting for sticky-TF and health reporting? It seems confusing to me that the LB policy itself is tracking two different states, but I'm willing to believe it's simpler this way if you tried it the other way already.

Contributor Author

arjan-bal Nov 25, 2024

I was referring the Java implementation of Pickfirst and they were handling the sticky TF behaviour in the LB Policy. I don't see any issue with handling sticky TF in the subchannel state. I've update the PR to reflect the suggestions.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

dfawley removed their assignment


          Address review comments

cf153cb

arjan-bal force-pushed the pickfirst-health-listener branch from 2700efe to cf153cb Compare

November 20, 2024 15:18

arjan-bal assigned easwars and dfawley and unassigned arjan-bal

easwars approved these changes

View reviewed changes

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf_ext_test.go Show resolved Hide resolved

easwars assigned arjan-bal and unassigned easwars


          Add test for verifying listener is not registered when not enabled

13ad224

arjan-bal force-pushed the pickfirst-health-listener branch from 64e7a8a to 13ad224 Compare

November 21, 2024 07:52

arjan-bal assigned easwars and unassigned arjan-bal

easwars approved these changes

View reviewed changes

easwars assigned arjan-bal and unassigned easwars

arjan-bal removed their assignment

dfawley reviewed

View reviewed changes

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated

-              	mu    sync.Mutex
-              	state connectivity.State
+              	mu                sync.Mutex
+              	connectivityState connectivity.State

Member

dfawley Nov 22, 2024

Shouldn't this kind of tracking be done in the subconn struct (scData) and not here? I would expect the lb policy only has the concludedState and each subchannel needs to track its real state and its effective state, accounting for sticky-TF and health reporting? It seems confusing to me that the LB policy itself is tracking two different states, but I'm willing to believe it's simpler this way if you tried it the other way already.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated

               	// The picker will not change since the balancer does not currently
               	// report an error. If the balancer hasn't received a single good resolver
               	// update yet, transition to TRANSIENT_FAILURE.
-              	if b.state != connectivity.TransientFailure && b.addressList.size() > 0 {
+              	if b.connectivityState != connectivity.TransientFailure && b.addressList.size() > 0 {

Member

dfawley Nov 22, 2024

E.g. of above: I find the usages confusing here -- how do we know whether it's correct for this to be checking connectivityState or concludedState?

Contributor Author

arjan-bal Nov 25, 2024

Moved the sticky TF behaviour into the subchannel state as suggested. There is only one balancer.state field now.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated

+              // ClientConn.UpdateState(). As an optimization, it avoid sending duplicate
+              // updates to the channel for state CONNECTING.
+              func (b *pickfirstBalancer) updateConcludedStateLocked(newState balancer.State) {
+              	// Optimization to not send duplicate CONNECTING updates.

Member

dfawley Nov 22, 2024

Something in the comments should explain why it's OK to do this for CONNECTING but not the other states (because the queuing is the same). I guess READY->READY with the same subchannel is impossible or something? TF to TF is done only to update the error message. So should we instead flip this so it ignores any duplicates except TF to TF?

Contributor Author

arjan-bal Nov 25, 2024

READY -> READY may be possible if the health check client get two consecutive SERVING updates. Presently addrConn ensures that duplicate state updates are not sent here

grpc-go/clientconn.go

Lines 1192 to 1204 in dcba136

    
           func (ac *addrConn) updateConnectivityState(s connectivity.State, lastErr error) { 
        
           	if ac.state == s { 
        
           		return 
        
           	} 
        
           	ac.state = s 
        
           	ac.channelz.ChannelMetrics.State.Store(&s) 
        
           	if lastErr == nil { 
        
           		channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v", s) 
        
           	} else { 
        
           		channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v, last error: %s", s, lastErr) 
        
           	} 
        
           	ac.acbw.updateState(s, ac.curAddr, lastErr) 
        
           }

I've inverted the check as suggested to handle such cases.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

+              func (b *pickfirstBalancer) updateConcludedStateLocked(newState balancer.State) {
+              	// Optimization to not send duplicate CONNECTING updates.
+              	if newState.ConnectivityState == b.concludedState && b.concludedState == connectivity.Connecting {
+              		return

Member

dfawley Nov 22, 2024

Codecov says this isn't covered by tests. So is it actually dead code that's impossible to happen in real life, anyway, and we can just delete it?

Contributor Author

arjan-bal Nov 25, 2024

This will alaways happend while client side health checks are enabled. The health service client sends an update to CONNECTING when establishing a stream

grpc-go/health/client.go

Line 73 in dcba136

setConnectivityState(connectivity.Connecting, nil)

This results in duplicate CONNECTING updates. I will raise a follow up CL to add client side health checks using the health listener.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go

Comment on lines +756 to +758

+              // A separate function is defined to force update the ClientConn state since the
+              // channel doesn't correctly assume that LB policies start in CONNECTING and
+              // relies on LB policy to send an initial CONNECTING update.

Member

dfawley Nov 22, 2024

Should we change that assumption in the channel? File an issue and add a TODO to clean this up? Maybe it's related to #7686 in some way? I believe the assumption in the other languages is that LB policies start CONNECTING, and queue picks.

Contributor Author

arjan-bal Nov 25, 2024

This is related to #7686. I can do that in a separate PR. I was thinking of sending a picker and state update in the ccBalancerWrapper constructor here

grpc-go/balancer_wrapper.go

Lines 75 to 94 in dcba136

    
           func newCCBalancerWrapper(cc *ClientConn) *ccBalancerWrapper { 
        
           	ctx, cancel := context.WithCancel(cc.ctx) 
        
           	ccb := &ccBalancerWrapper{ 
        
           		cc: cc, 
        
           		opts: balancer.BuildOptions{ 
        
           			DialCreds:       cc.dopts.copts.TransportCredentials, 
        
           			CredsBundle:     cc.dopts.copts.CredsBundle, 
        
           			Dialer:          cc.dopts.copts.Dialer, 
        
           			Authority:       cc.authority, 
        
           			CustomUserAgent: cc.dopts.copts.UserAgent, 
        
           			ChannelzParent:  cc.channelz, 
        
           			Target:          cc.parsedTarget, 
        
           			MetricsRecorder: cc.metricsRecorderList, 
        
           		}, 
        
           		serializer:       grpcsync.NewCallbackSerializer(ctx), 
        
           		serializerCancel: cancel, 
        
           	} 
        
           	ccb.balancer = gracefulswitch.NewBalancer(ccb, ccb.opts) 
        
           	return ccb 
        
           }

Is this the correct place to make this change?

dfawley assigned arjan-bal and unassigned dfawley

arjan-bal added 2 commits

November 25, 2024 15:11


          Move sticky TF into subchannel state

ce4d36f


          Address review comments

be9ca5c

arjan-bal assigned dfawley and unassigned arjan-bal

arjan-bal added 2 commits

November 28, 2024 13:53


          Merge remote-tracking branch 'source/master' into pickfirst-health-li…

c109eb4

…stener


          Rename effective state variable

d852260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Resolvers/Balancers Type: Feature