-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickfirst: Register a health listener when used as a leaf policy #7832
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7832 +/- ##
==========================================
+ Coverage 81.84% 82.09% +0.24%
==========================================
Files 377 377
Lines 38120 38184 +64
==========================================
+ Hits 31201 31346 +145
+ Misses 5603 5541 -62
+ Partials 1316 1297 -19
|
// EnableHealthListener updates the state to configure pickfirst for using a | ||
// generic health listener. | ||
func EnableHealthListener(attrs *attributes.Attributes) *attributes.Attributes { | ||
return attrs.WithValue(enableHealthListenerKeyType{}, enableHealthListenerValue) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these types of functions we prefer to make them operate on the thing that contains the attributes. In this case, that would be a resolver.State
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the function to accept resolver.State
.
mu sync.Mutex | ||
state connectivity.State | ||
mu sync.Mutex | ||
connectivityState connectivity.State |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves a comment, too, since there are now two very similar fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this kind of tracking be done in the subconn struct (scData
) and not here? I would expect the lb policy only has the concludedState
and each subchannel needs to track its real state and its effective state, accounting for sticky-TF and health reporting? It seems confusing to me that the LB policy itself is tracking two different states, but I'm willing to believe it's simpler this way if you tried it the other way already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring the Java implementation of Pickfirst and they were handling the sticky TF behaviour in the LB Policy. I don't see any issue with handling sticky TF in the subchannel state. I've update the PR to reflect the suggestions.
2700efe
to
cf153cb
Compare
64e7a8a
to
13ad224
Compare
mu sync.Mutex | ||
state connectivity.State | ||
mu sync.Mutex | ||
connectivityState connectivity.State |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this kind of tracking be done in the subconn struct (scData
) and not here? I would expect the lb policy only has the concludedState
and each subchannel needs to track its real state and its effective state, accounting for sticky-TF and health reporting? It seems confusing to me that the LB policy itself is tracking two different states, but I'm willing to believe it's simpler this way if you tried it the other way already.
@@ -179,14 +198,15 @@ func (b *pickfirstBalancer) resolverErrorLocked(err error) { | |||
// The picker will not change since the balancer does not currently | |||
// report an error. If the balancer hasn't received a single good resolver | |||
// update yet, transition to TRANSIENT_FAILURE. | |||
if b.state != connectivity.TransientFailure && b.addressList.size() > 0 { | |||
if b.connectivityState != connectivity.TransientFailure && b.addressList.size() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. of above: I find the usages confusing here -- how do we know whether it's correct for this to be checking connectivityState
or concludedState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the sticky TF behaviour into the subchannel state as suggested. There is only one balancer.state field now.
// ClientConn.UpdateState(). As an optimization, it avoid sending duplicate | ||
// updates to the channel for state CONNECTING. | ||
func (b *pickfirstBalancer) updateConcludedStateLocked(newState balancer.State) { | ||
// Optimization to not send duplicate CONNECTING updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something in the comments should explain why it's OK to do this for CONNECTING but not the other states (because the queuing is the same). I guess READY->READY with the same subchannel is impossible or something? TF to TF is done only to update the error message. So should we instead flip this so it ignores any duplicates except TF to TF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
READY -> READY may be possible if the health check client get two consecutive SERVING updates. Presently addrConn ensures that duplicate state updates are not sent here
Lines 1192 to 1204 in dcba136
func (ac *addrConn) updateConnectivityState(s connectivity.State, lastErr error) { | |
if ac.state == s { | |
return | |
} | |
ac.state = s | |
ac.channelz.ChannelMetrics.State.Store(&s) | |
if lastErr == nil { | |
channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v", s) | |
} else { | |
channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v, last error: %s", s, lastErr) | |
} | |
ac.acbw.updateState(s, ac.curAddr, lastErr) | |
} |
I've inverted the check as suggested to handle such cases.
func (b *pickfirstBalancer) updateConcludedStateLocked(newState balancer.State) { | ||
// Optimization to not send duplicate CONNECTING updates. | ||
if newState.ConnectivityState == b.concludedState && b.concludedState == connectivity.Connecting { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov says this isn't covered by tests. So is it actually dead code that's impossible to happen in real life, anyway, and we can just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will alaways happend while client side health checks are enabled. The health service client sends an update to CONNECTING when establishing a stream
Line 73 in dcba136
setConnectivityState(connectivity.Connecting, nil) |
This results in duplicate CONNECTING updates. I will raise a follow up CL to add client side health checks using the health listener.
// A separate function is defined to force update the ClientConn state since the | ||
// channel doesn't correctly assume that LB policies start in CONNECTING and | ||
// relies on LB policy to send an initial CONNECTING update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change that assumption in the channel? File an issue and add a TODO to clean this up? Maybe it's related to #7686 in some way? I believe the assumption in the other languages is that LB policies start CONNECTING, and queue picks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to #7686. I can do that in a separate PR. I was thinking of sending a picker and state update in the ccBalancerWrapper
constructor here
Lines 75 to 94 in dcba136
func newCCBalancerWrapper(cc *ClientConn) *ccBalancerWrapper { | |
ctx, cancel := context.WithCancel(cc.ctx) | |
ccb := &ccBalancerWrapper{ | |
cc: cc, | |
opts: balancer.BuildOptions{ | |
DialCreds: cc.dopts.copts.TransportCredentials, | |
CredsBundle: cc.dopts.copts.CredsBundle, | |
Dialer: cc.dopts.copts.Dialer, | |
Authority: cc.authority, | |
CustomUserAgent: cc.dopts.copts.UserAgent, | |
ChannelzParent: cc.channelz, | |
Target: cc.parsedTarget, | |
MetricsRecorder: cc.metricsRecorderList, | |
}, | |
serializer: grpcsync.NewCallbackSerializer(ctx), | |
serializerCancel: cancel, | |
} | |
ccb.balancer = gracefulswitch.NewBalancer(ccb, ccb.opts) | |
return ccb | |
} |
Is this the correct place to make this change?
As part of the dualstack changes described in A61,
pickfirst
will become the universal leaf policy. This PR provides aEnableHealthListener
function for petiole policies to inform pickfirst when it's functioning as a leaf policy.When functioning as a leaf policy,
pickfirst
will subscribe to health updates using the SubConn.RegisterHealthListener API introduced in #7780 once the SubConn connectivity state becomes READY. The health state will be used to update theClientConn
state as long as the SubConn's connectivity state remains READY.RELEASE NOTES: N/A