Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when dynamicStableScale is enabled, the 2nd canary deployment (when the first one didn't finish) shifts the traffic to the stable 100% without waiting for it to scale up #3372

Open
2 tasks done
MarwanTukhta opened this issue Feb 14, 2024 · 3 comments · May be fixed by #3878 or #3862
Labels
bug Something isn't working

Comments

@MarwanTukhta
Copy link

MarwanTukhta commented Feb 14, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug
When using canary strategy and adding dynamicStableScale if a newer release were pushed (v3) while an unstable release (v2) is still being deployed, argo rollouts controller will shift all the traffic to the stable release (v1) without waiting for it to scale up.

To Reproduce
Just deploy a simple rollout with traffic routing and dynamicStableScale enabled, then change the image version to 1.25.2 and apply the changes, now the v1 replicaset will have 1 replica while v2 will have 3, change the image to 1.25.3 and the issue will happen, it will immediately shift all traffic to v1 without waiting for it pods to start, since we are using nginx as an example it will be very short, but on production, this caused an outage

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: test-argo-rollout
  labels:
    app: test-argo-rollout
spec:
  replicas: 4
  strategy:
    canary:
      canaryService: test-argo-rollout-canary
      stableService: test-argo-rollout
      dynamicStableScale: true
      abortScaleDownDelaySeconds: 300
      maxUnavailable: 0
      trafficRouting:
        nginx:
          stableIngress: test-argo-rollout-ingress
      steps:
        - setWeight: 90
        - pause: {}
  selector:
    matchLabels:
      app: test-argo-rollout
  template:
    metadata:
      labels:
        app: test-argo-rollout
    spec:
      containers:
        - name: nginx
          image: nginx:1.25.1
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
              name: http
              protocol: TCP
          resources:
            requests:
              cpu: 100m
              memory: 100Mi

---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-argo-rollout-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  ingressClassName: nginx
  rules:
    - host:<HOST-HERE>
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: test-argo-rollout
                port:
                  number: 80

---

apiVersion: v1
kind: Service
metadata:
  name: test-argo-rollout
  labels:
    app: test-argo-rollout
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: test-argo-rollout

---

apiVersion: v1
kind: Service
metadata:
  name: test-argo-rollout-canary
  labels:
    app: test-argo-rollout
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: test-argo-rollout

Expected behavior

It should act exactly like an abort on v2 so the traffic shifting to v1 will be gradual, and then move to v3

Version

1.6.0

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@MarwanTukhta MarwanTukhta added the bug Something isn't working label Feb 14, 2024
@zachaller
Copy link
Collaborator

This might possibly be fixed by #3077 can you test with the latest 1.6.6?

@MarwanTukhta
Copy link
Author

This might possibly be fixed by #3077 can you test with the latest 1.6.6?

I have tried it on 1.6.6, the bug is still there

@Gabryel8818
Copy link

I have same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants