Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When rollout new controller version, it should not trigger a rolling update #240

Open
liurupeng opened this issue Oct 24, 2024 · 10 comments · Fixed by #250
Open

When rollout new controller version, it should not trigger a rolling update #240

liurupeng opened this issue Oct 24, 2024 · 10 comments · Fixed by #250
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@liurupeng
Copy link
Collaborator

What happened:
In the rolling update triggering logic, we should store the template somewhere else, and only compare the fields with the original one, and check if these fields are updated, then decides to trigger a rolling update. This will mitigate the issue when we added a API field, it will not trigger rolling update when rolling out new controller version
What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • LWS version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@liurupeng liurupeng added the kind/bug Categorizes issue or PR as related to a bug. label Oct 24, 2024
@liurupeng
Copy link
Collaborator Author

/assign

@ahg-g
Copy link
Contributor

ahg-g commented Oct 26, 2024

For the short term, lets patch the current release with a special fix for the new field we added, the NetworkConfig.

Basically if the NetworkConfig is set to the default values, set it to nil in the controller when calculating the hash, this should avoid changing the hash value when the new controller is rolled out.

@liurupeng
Copy link
Collaborator Author

yep, this could be a short term fix

@Edwinhr716
Copy link
Contributor

For the short term fix, don't we already exclude NetworkConfig if it is nil https://github.com/kubernetes-sigs/lws/blob/main/pkg/utils/utils.go#L44? @ahg-g

@ahg-g
Copy link
Contributor

ahg-g commented Oct 31, 2024

The issue is that the NetworkConfig will not be nil since it will be defaulted, which means the hash for an existing deployment will be different from the one stored as an annotation and generated previously by the older operator version.

@Edwinhr716
Copy link
Contributor

I thought it isn't defaulted when upgrading the LWS controller #197 (comment), and that's why we added the nil check?

@ahg-g
Copy link
Contributor

ahg-g commented Oct 31, 2024

It is getting defaulted:

if lws.Spec.NetworkConfig == nil {

But can you test that on a real cluster?

@Edwinhr716
Copy link
Contributor

I tested upgrading from 0.2.0 to 0.4.1 and the upgrade did not trigger an upgrade of the statefulSets (so the template hash was not updated)

@ahg-g
Copy link
Contributor

ahg-g commented Oct 31, 2024

Check if the NetworkConfig get defaulted, if not, try to tigger an update on LWS that doesn't trigger a rolling update to trigger calling the defaulting function.

@Edwinhr716
Copy link
Contributor

Upgraded from 0.2.0 to 0.4.1. NetworkConfig does not get defaulted. Updated LWS object by removing a metadata label to trigger defaulting option, NetworkConfig gets defaulted and rollingUpdate is triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants