[Roadmap] Improve kubeadm support for declarative approaches/git-ops #2317

fabriziopandini · 2020-09-30T15:57:48Z

Kubeadm, being a CLI, does not play well with declarative approaches/git-ops workflows.

Assuming that kubeadm is divided in two main parts

bootstrapping a node (transforming a machine into a node: init, join)
managing an existing node (e.g. upgrades, renew certs, changing a node)

This issue is about collecting ideas and define a viable path for making 2 possible using declarative approaches, sometimes referred also as in-place mutations.

For this first iteration, I consider 1 out of scope, mainly because bootstrapping nodes with a declarative approach is already covered by Cluster API and it is clearly out of the scope of kubeadm.

fabriziopandini · 2020-10-01T10:14:30Z

Prior discussion from #1698

@timothysc

As a Kubernetes Operator I would like to enable be able to declaratively control configuration changes, and upgrades in a systematic fashion.

@fabriziopandini

IMO the kubeadm operator should be responsible for two things

In place mutations of kubeadm generated artifacts

Orchestration of such mutations across nodes
Instead, I think that we should consider out of scope everything that fits under the management of infrastructure or it is related to the management of "immutable" nodes (where "Immutable" = any operation done deploying a new node and removing the old one)

@neolit123

my other top question, this can end up being not-so-secure.

fabriziopandini · 2020-10-06T13:04:17Z

For the kubeadm operator, I think we should focus on the first use case, "declaratively control configuration changes", given that this is not supported by kubeadm now and it was a top priority in the recent survey

fabriziopandini · 2021-03-01T11:06:10Z

Draft KEP https://docs.google.com/document/d/14Cb2fQfRVpPSQuNz0MYbtGmO63xBglT7x7h57ZeK5PI/edit?usp=sharing

jhughes2112 · 2021-03-01T16:50:39Z

Interesting proposal. What I've struggled with over the past year using kubeadm is specifically what I addressed with my own scripts that (procedurally) builds clusters (on github: k8smaker). The best practices for configuring a bare metal cluster is pretty complex. Doing so with AWS is too. The preconditioning script depends strongly on the underlying OS involved. But having built it modularly, I can see how the cluster construction (init, join) can be made completely extensible while providing a fully declarative interface to the user.

It requires:

ssh credentials to access any new node with sudo privileges (from whence kubeadm or the proposed operator executes)
a preconditioning script that configures the OS from a clean state
a decommissioning script that resets the OS to an unused state
a configuration CRD that describes the nodes that should be part of the cluster

I offer an opinion: I realize this was specifically stated as out-of-scope for this proposal. I'm suggesting it should be the focus instead of day 2 operations. It seems like a lot of k8s admins have a procedure where upgrading an existing production cluster tends to be (much) more dangerous than building a new one. By automating more of the upgrades, it adds a "magicalness" to that process which results in inevitable breakage being more severe rather than less. Whereas automating the construction process drives towards a very desirable workflow for automating and simplifying the process: simply remove nodes from an existing production cluster description and add them to the new cluster.

Thanks for the consideration.

neolit123 · 2021-03-01T17:18:27Z

It seems like a lot of k8s admins have a procedure where upgrading an existing production cluster tends to be (much) more dangerous than building a new one.

i think no matter what we do with kubernetes upgrades we will not be able to fully guarantee zero failures to the users, unless this is fully managed by some high level tooling that understand everything that the user has and wants - including node host details, infrastructure availability and all caveats of the current and next k8s version.

kubeadm or the operator can encode some details about the next k8s version or the node host, but that's all.

the so called "blue / green" cluster upgrades may seem as the better option in the eyes of the user, since the user has the control to scrap the old cluster only once the new cluster is fully working. but they also require infrastructure that some users on self hosted bare metal simply don't have.

Whereas automating the construction process drives towards a very desirable workflow for automating and simplifying the process: simply remove nodes from an existing production cluster description and add them to the new cluster.

we call these node re-place upgrades and the Cluster API is doing them. your project may have recreated parts of Cluster API, kubespray or kops, which are tools that are higher level than kubeadm.

fejta-bot · 2021-05-30T17:35:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

neolit123 · 2021-05-31T15:03:39Z

/remove-lifecycle stale

k8s-triage-robot · 2021-08-29T15:52:36Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 · 2021-08-30T12:19:33Z

/remove-lifecycle stale

k8s-triage-robot · 2021-09-29T12:26:42Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

fabriziopandini · 2021-09-30T12:52:41Z

/remove-lifecycle stale

neolit123 · 2022-02-14T15:24:11Z

xref related discussion about cert rotation #2652

pacoxu · 2022-06-20T05:18:02Z

Not sure if this is the right place to discuss on kubeadm operator. There are some threads in kubernetes/enhancements#2505.

I write a simple kubelet-reloader as a tool for kubeadm operator.

kubelet-reloader will watch on /usr/bin/kubelet-new.
once there is a different version of kubelet-new, the reloader will replace /usr/bin/kubelet and restart kubelet.

Currently the kubeadm-operator v0.1.0 can support upgrade cross versions like v1.22 to v1.24.

kubeadm operator will download kubectl/kubelet/kubeadm and upgrade.
kubelet will be placed in /usr/bin/kubelet-new for kubelet reloader.

See quick-start.

Some thoughts on the next steps

Add CRD： to define the version we want pacoxu/kubeadm-operator#88: a kubeadm operator CRD with the target version of this cluster. The controller can then create operations for it automatically.
offline install supports pacoxu/kubeadm-operator#87 offline supports
"yum/apt install" instead of download binary pacoxu/kubeadm-operator#86

neolit123 · 2024-11-28T16:56:11Z

bootstrapping a node (transforming a machine into a node: init, join)

managing an existing node (e.g. upgrades, renew certs, changing a node)

for 2 we decided that this should be part of a kubeadm operator, but the same time we decided to externalize it and not make SIG CL own the project. this means that there isn't anything actionable.

and yes for 1, tools that wrap kubeadm can create the declarative layer (e.g. like CAPI does) to define the topology of how many nodes etc.

fabriziopandini mentioned this issue Sep 30, 2020

RFE: Document the kubeadm roadmap #2315

Open

fabriziopandini mentioned this issue Oct 1, 2020

RFE: kubeadm operator #1698

Closed

fabriziopandini mentioned this issue Oct 26, 2020

delete go.mod in root directory #2333

Closed

neolit123 added this to the v1.21 milestone Nov 7, 2020

neolit123 added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/design Categorizes issue or PR as related to design. labels Nov 7, 2020

neolit123 modified the milestones: v1.21, Next Feb 3, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 29, 2021

neolit123 added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 29, 2021

neolit123 mentioned this issue Feb 14, 2022

kubeadm: support cert rotation or configurable cert expiration #2652

Closed

neolit123 mentioned this issue Apr 5, 2022

kubeadm: add task page for cluster reconf kubernetes/website#32764

Merged

pacoxu mentioned this issue Jun 30, 2022

Kubeadm operator kubernetes/enhancements#2505

Open

4 tasks

pacoxu mentioned this issue Aug 10, 2022

Gather user cases for kubeadm operator from CAPI side kubernetes-sigs/cluster-api#7044

Closed

neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Nov 8, 2023

neolit123 closed this as completed Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] Improve kubeadm support for declarative approaches/git-ops #2317

[Roadmap] Improve kubeadm support for declarative approaches/git-ops #2317

fabriziopandini commented Sep 30, 2020

fabriziopandini commented Oct 1, 2020

fabriziopandini commented Oct 6, 2020

fabriziopandini commented Mar 1, 2021

jhughes2112 commented Mar 1, 2021

neolit123 commented Mar 1, 2021 •

edited

Loading

fejta-bot commented May 30, 2021

neolit123 commented May 31, 2021

k8s-triage-robot commented Aug 29, 2021

neolit123 commented Aug 30, 2021

k8s-triage-robot commented Sep 29, 2021

fabriziopandini commented Sep 30, 2021

neolit123 commented Feb 14, 2022

pacoxu commented Jun 20, 2022 •

edited

Loading

neolit123 commented Nov 28, 2024

[Roadmap] Improve kubeadm support for declarative approaches/git-ops #2317

[Roadmap] Improve kubeadm support for declarative approaches/git-ops #2317

Comments

fabriziopandini commented Sep 30, 2020

fabriziopandini commented Oct 1, 2020

fabriziopandini commented Oct 6, 2020

fabriziopandini commented Mar 1, 2021

jhughes2112 commented Mar 1, 2021

neolit123 commented Mar 1, 2021 • edited Loading

fejta-bot commented May 30, 2021

neolit123 commented May 31, 2021

k8s-triage-robot commented Aug 29, 2021

neolit123 commented Aug 30, 2021

k8s-triage-robot commented Sep 29, 2021

fabriziopandini commented Sep 30, 2021

neolit123 commented Feb 14, 2022

pacoxu commented Jun 20, 2022 • edited Loading

neolit123 commented Nov 28, 2024

neolit123 commented Mar 1, 2021 •

edited

Loading

pacoxu commented Jun 20, 2022 •

edited

Loading