Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation for NodeUpgradeController #7061

Merged
merged 9 commits into from
Nov 23, 2023

Conversation

abhinavmpandey08
Copy link
Member

@abhinavmpandey08 abhinavmpandey08 commented Nov 19, 2023

Description of changes:
The implementation is based on this design #6893

The goal of this PR is to add the implementation for the NodeUpgradeController.
The purpose of this controller is to handle in-place upgrade of nodes by provisioning upgrader pods on the individual nodes that go in and upgrade the components on the node to the desired version.
These components include:

  • containerd
  • cni-plugins
  • kubeadm
  • kubelet
  • kubectl

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot eks-distro-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 19, 2023
Copy link

codecov bot commented Nov 19, 2023

Codecov Report

Attention: 70 lines in your changes are missing coverage. Please review.

Comparison is base (a97e239) 71.34% compared to head (b691d5e) 71.56%.
Report is 9 commits behind head on main.

Files Patch % Lines
controllers/nodeupgrade_controller.go 75.09% 51 Missing and 15 partials ⚠️
pkg/api/v1alpha1/nodeupgrade_types.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7061      +/-   ##
==========================================
+ Coverage   71.34%   71.56%   +0.21%     
==========================================
  Files         544      545       +1     
  Lines       41963    42318     +355     
==========================================
+ Hits        29940    30283     +343     
+ Misses      10345    10343       -2     
- Partials     1678     1692      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eks-distro-bot eks-distro-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 20, 2023
config/rbac/role.yaml Show resolved Hide resolved
controllers/nodeupgrade_controller.go Show resolved Hide resolved
controllers/nodeupgrade_controller.go Show resolved Hide resolved

func getInitContainerStatus(pod *corev1.Pod, containerName string) (*corev1.ContainerStatus, error) {
for _, status := range pod.Status.InitContainerStatuses {
if status.Name == containerName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want a mapping in the future to make it more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I can add that in a follow up

pkg/nodeupgrader/upgrader.go Outdated Show resolved Hide resolved
}

// UpgradeWorkerPod returns an upgrader pod that can be deployed on worker nodes.
func UpgradeWorkerPod(nodeName, image string) *corev1.Pod {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel like they should be member methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, let's discuss this offline and see if this is something we can acknowledge in a follow up

@rahulbabu95
Copy link
Member

/lgtm

@abhinavmpandey08
Copy link
Member Author

/approve

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavmpandey08

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavmpandey08
Copy link
Member Author

/hold

Copy link
Member

@pokearu pokearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@rahulbabu95
Copy link
Member

/lgtm

@abhinavmpandey08
Copy link
Member Author

/unhold

@eks-distro-bot eks-distro-bot merged commit 7770f9c into aws:main Nov 23, 2023
7 checks passed
- apiGroups:
- ""
resources:
- pods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to move this to a role instead of cluster role
this gives way too many privileges to the controller

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed anymore since the controller is building a remote client from the kubeconfig stored on the cluster which has admin on workload clusters. I'll do some testing and remove it

NodeUpgradeKind = "NodeUpgrade"

// UpgraderPodCreated reports whether the upgrader pod has been created for the node upgrade.
UpgraderPodCreated ConditionType = "UpgraderPodCreated"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's wise making the phases of an upgrade tied in this way to the API. Your phases might change more frequently than you want your API to change. Also, what if some phases become optional? How are client going to know that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's wise making the phases of an upgrade tied in this way to the API. Your phases might change more frequently than you want your API to change.

Can you elaborate a little on this? This is just adding container statuses on the node upgrade object.
The controller will check the container statuses at the end of every reconcile loop and update the node upgrade status based on that. And once all the containers have finished, it will mark the upgrade as completed and end the reconcile loop.
I extrapolated this logic from the EKS-A controller which does something similar where it monitors the KCP/MD/CAPI cluster conditions and uses it to update EKS-A Cluster conditions. And when all conditions are met, the reconcile is marked complete.

Also, what if some phases become optional? How are client going to know that?

This depends on how we decide to implement it. We were thinking we won't make the phases optional from NodeUpgradeController's perspective. Rather the pod will check if the step it's performing needs to be done or not. If it doesn't need to be performed, the upgrader will just not do it and return a success.
For example, if the containerd version on the node is already the latest one, then the upgrader can just skip copying over the containerd binary and return without errors.

@abhinavmpandey08 abhinavmpandey08 deleted the in-place-controllers branch November 27, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants