Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickfirst: Register a health listener when used as a leaf policy #7832

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

arjan-bal
Copy link
Contributor

As part of the dualstack changes described in A61, pickfirst will become the universal leaf policy. This PR provides a EnableHealthListener function for petiole policies to inform pickfirst when it's functioning as a leaf policy.

When functioning as a leaf policy, pickfirst will subscribe to health updates using the SubConn.RegisterHealthListener API introduced in #7780 once the SubConn connectivity state becomes READY. The health state will be used to update the ClientConn state as long as the SubConn's connectivity state remains READY.

RELEASE NOTES: N/A

@arjan-bal arjan-bal added Type: Feature New features or improvements in behavior Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. labels Nov 13, 2024
@arjan-bal arjan-bal added this to the 1.69 Release milestone Nov 13, 2024
Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 85.85859% with 14 lines in your changes missing coverage. Please review.

Project coverage is 82.09%. Comparing base (4c07bca) to head (d852260).

Files with missing lines Patch % Lines
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go 85.85% 10 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7832      +/-   ##
==========================================
+ Coverage   81.84%   82.09%   +0.24%     
==========================================
  Files         377      377              
  Lines       38120    38184      +64     
==========================================
+ Hits        31201    31346     +145     
+ Misses       5603     5541      -62     
+ Partials     1316     1297      -19     
Files with missing lines Coverage Δ
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go 86.99% <85.85%> (-0.82%) ⬇️

... and 25 files with indirect coverage changes

@arjan-bal arjan-bal modified the milestone: 1.69 Release Nov 13, 2024
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
@easwars easwars assigned arjan-bal and unassigned easwars Nov 15, 2024
@arjan-bal arjan-bal assigned easwars and unassigned easwars Nov 18, 2024
@arjan-bal arjan-bal assigned easwars and unassigned arjan-bal Nov 19, 2024
@easwars easwars assigned arjan-bal and unassigned easwars Nov 19, 2024
Comment on lines 119 to 123
// EnableHealthListener updates the state to configure pickfirst for using a
// generic health listener.
func EnableHealthListener(attrs *attributes.Attributes) *attributes.Attributes {
return attrs.WithValue(enableHealthListenerKeyType{}, enableHealthListenerValue)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these types of functions we prefer to make them operate on the thing that contains the attributes. In this case, that would be a resolver.State.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the function to accept resolver.State.

mu sync.Mutex
state connectivity.State
mu sync.Mutex
connectivityState connectivity.State
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves a comment, too, since there are now two very similar fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this kind of tracking be done in the subconn struct (scData) and not here? I would expect the lb policy only has the concludedState and each subchannel needs to track its real state and its effective state, accounting for sticky-TF and health reporting? It seems confusing to me that the LB policy itself is tracking two different states, but I'm willing to believe it's simpler this way if you tried it the other way already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring the Java implementation of Pickfirst and they were handling the sticky TF behaviour in the LB Policy. I don't see any issue with handling sticky TF in the subchannel state. I've update the PR to reflect the suggestions.

balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
@dfawley dfawley removed their assignment Nov 19, 2024
@easwars easwars assigned arjan-bal and unassigned easwars Nov 20, 2024
@easwars easwars assigned arjan-bal and unassigned easwars Nov 21, 2024
@arjan-bal arjan-bal removed their assignment Nov 22, 2024
balancer/pickfirst/pickfirstleaf/pickfirstleaf.go Outdated Show resolved Hide resolved
mu sync.Mutex
state connectivity.State
mu sync.Mutex
connectivityState connectivity.State
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this kind of tracking be done in the subconn struct (scData) and not here? I would expect the lb policy only has the concludedState and each subchannel needs to track its real state and its effective state, accounting for sticky-TF and health reporting? It seems confusing to me that the LB policy itself is tracking two different states, but I'm willing to believe it's simpler this way if you tried it the other way already.

@@ -179,14 +198,15 @@ func (b *pickfirstBalancer) resolverErrorLocked(err error) {
// The picker will not change since the balancer does not currently
// report an error. If the balancer hasn't received a single good resolver
// update yet, transition to TRANSIENT_FAILURE.
if b.state != connectivity.TransientFailure && b.addressList.size() > 0 {
if b.connectivityState != connectivity.TransientFailure && b.addressList.size() > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. of above: I find the usages confusing here -- how do we know whether it's correct for this to be checking connectivityState or concludedState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the sticky TF behaviour into the subchannel state as suggested. There is only one balancer.state field now.

// ClientConn.UpdateState(). As an optimization, it avoid sending duplicate
// updates to the channel for state CONNECTING.
func (b *pickfirstBalancer) updateConcludedStateLocked(newState balancer.State) {
// Optimization to not send duplicate CONNECTING updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in the comments should explain why it's OK to do this for CONNECTING but not the other states (because the queuing is the same). I guess READY->READY with the same subchannel is impossible or something? TF to TF is done only to update the error message. So should we instead flip this so it ignores any duplicates except TF to TF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

READY -> READY may be possible if the health check client get two consecutive SERVING updates. Presently addrConn ensures that duplicate state updates are not sent here

grpc-go/clientconn.go

Lines 1192 to 1204 in dcba136

func (ac *addrConn) updateConnectivityState(s connectivity.State, lastErr error) {
if ac.state == s {
return
}
ac.state = s
ac.channelz.ChannelMetrics.State.Store(&s)
if lastErr == nil {
channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v", s)
} else {
channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v, last error: %s", s, lastErr)
}
ac.acbw.updateState(s, ac.curAddr, lastErr)
}

I've inverted the check as suggested to handle such cases.

func (b *pickfirstBalancer) updateConcludedStateLocked(newState balancer.State) {
// Optimization to not send duplicate CONNECTING updates.
if newState.ConnectivityState == b.concludedState && b.concludedState == connectivity.Connecting {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov says this isn't covered by tests. So is it actually dead code that's impossible to happen in real life, anyway, and we can just delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will alaways happend while client side health checks are enabled. The health service client sends an update to CONNECTING when establishing a stream

setConnectivityState(connectivity.Connecting, nil)

This results in duplicate CONNECTING updates. I will raise a follow up CL to add client side health checks using the health listener.

Comment on lines +756 to +758
// A separate function is defined to force update the ClientConn state since the
// channel doesn't correctly assume that LB policies start in CONNECTING and
// relies on LB policy to send an initial CONNECTING update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change that assumption in the channel? File an issue and add a TODO to clean this up? Maybe it's related to #7686 in some way? I believe the assumption in the other languages is that LB policies start CONNECTING, and queue picks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to #7686. I can do that in a separate PR. I was thinking of sending a picker and state update in the ccBalancerWrapper constructor here

func newCCBalancerWrapper(cc *ClientConn) *ccBalancerWrapper {
ctx, cancel := context.WithCancel(cc.ctx)
ccb := &ccBalancerWrapper{
cc: cc,
opts: balancer.BuildOptions{
DialCreds: cc.dopts.copts.TransportCredentials,
CredsBundle: cc.dopts.copts.CredsBundle,
Dialer: cc.dopts.copts.Dialer,
Authority: cc.authority,
CustomUserAgent: cc.dopts.copts.UserAgent,
ChannelzParent: cc.channelz,
Target: cc.parsedTarget,
MetricsRecorder: cc.metricsRecorderList,
},
serializer: grpcsync.NewCallbackSerializer(ctx),
serializerCancel: cancel,
}
ccb.balancer = gracefulswitch.NewBalancer(ccb, ccb.opts)
return ccb
}

Is this the correct place to make this change?

@dfawley dfawley assigned arjan-bal and unassigned dfawley Nov 22, 2024
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants