Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: support fine-grained Canary Service manipulation #241

Open
lujiajing1126 opened this issue Nov 22, 2024 · 8 comments · May be fixed by #243
Open

Proposal: support fine-grained Canary Service manipulation #241

lujiajing1126 opened this issue Nov 22, 2024 · 8 comments · May be fixed by #243

Comments

@lujiajing1126
Copy link
Contributor

背景

在当前的Rollouts实现中,创建Canary Service的动作是直接复制了Stable Service并且创建/覆盖了RevisionLabelKey,但在一些场景下存在局限性:

在实际使用过程中,PatchPodTemplateMetadata可以被用来向Canary Pods增加Labels和Annotations。比如我们可以增加version=canary这一label,来与Stable Pods进行区分(假设是version=base)。如果Stable Service上的selector已经存在version=base,那么在复制以后就无法选中灰度Pods,如下图所示

image

方案

CanaryStrategy结构体中,增加一个字段CanaryServiceOperations来标识对Canary Service的配置操作,

type CanaryServiceOperations {
   Add map[string]string
   Remove []string
}

type CanaryStrategy struct {
    // canary service will not be generated if DisableGenerateCanaryService is true
    DisableGenerateCanaryService bool `json:"disableGenerateCanaryService,omitempty"`

    // An Additional Field
    CanaryServiceOperations *CanaryServiceOperations `json:"canaryServiceOperations,omitempty"`
}
@furykerry
Copy link
Member

furykerry commented Nov 22, 2024

can we workaround the problem somehow, for example is it possible to specify another label (other than version=base) that is not conflict with your existing service selector label ?

@lujiajing1126
Copy link
Contributor Author

lujiajing1126 commented Nov 22, 2024

can we workaround the problem somehow, for example is it possible to specify another label (other than version=base) that is not conflict with your existing service selector label ?

Yes. It could be done. But in a large cluster, adding an additional label to Pod and also service selector are not easy. It has to be done generally in two steps,

  1. Adding the label to the existing Deployment, and PodTemplateSpec. It would lead to workload restart.
  2. After all pod are healthy, add service selector to the Stable Service.

We have > 1000 Deployments in the production cluster, so it would take a long-term to reach eventual consistency.

@furykerry
Copy link
Member

it is not necessary to change label of the exiting deployment, just change PatchPodTemplateMetadata field in the kruise-rollout.

@lujiajing1126
Copy link
Contributor Author

lujiajing1126 commented Nov 22, 2024

it is not necessary to change label of the exiting deployment, just change PatchPodTemplateMetadata field in the kruise-rollout.

We've already use PatchPodTemplateMetadata to patch canary pods. As shown in the fig above, the generated Canary pods are labeled with version=canary.

The following is what we want as hinted by your last reply,

image

But to achieve that, we have to add a new label ,for example app.kubernetes.io/part-of to both PodTemplateSpec and Service selector.

@furykerry
Copy link
Member

furykerry commented Nov 22, 2024

if your existing stable service and deployment is configure with the following selector

  1. app=app1
  2. version=base
  3. pod-template-hash=hash1

just configure the rollout PatchPodTemplateMetadata to patch kruise-version=canary, then the canary pod will be

  1. app=app1
  2. version=base
  3. kruise-version=canary
  4. pod-template-hash=hash2

and the selector of copied canary service will be:

  1. app=app1
  2. version=base
  3. pod-template-hash=hash2
    it can still select the canary pods

@lujiajing1126
Copy link
Contributor Author

lujiajing1126 commented Nov 26, 2024

if your existing stable service and deployment is configure with the following selector

  1. app=app1
  2. version=base
  3. pod-template-hash=hash1

just configure the rollout PatchPodTemplateMetadata to patch kruise-version=canary, then the canary pod will be

  1. app=app1
  2. version=base
  3. kruise-version=canary
  4. pod-template-hash=hash2

and the selector of copied canary service will be:

  1. app=app1
  2. version=base
  3. pod-template-hash=hash2
    it can still select the canary pods

I agree that it could work in general. But the problem is label app and version have already been extensively used in our system. For example,

  • version is used to select subset in DestinationRule,
  • version is used for Prometheus scraping tasks definition,
  • version is used for other traffic identification...

So it may not be suitable now to use version=base for Canary deployments. It may cause backwards incompatibility.

@StevenLeiZhang
Copy link

如果 canary svc的selector 只保留 pod-template-hash 这一个就行了,不复制其他label selector,是不是也能解决问题?

@lujiajing1126
Copy link
Contributor Author

如果 canary svc的selector 只保留 pod-template-hash 这一个就行了,不复制其他label selector,是不是也能解决问题?

Hash值的计算会不会有冲突?

@lujiajing1126 lujiajing1126 linked a pull request Nov 29, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants