-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeadm HA ( high availability ) checklist #261
Comments
/assign @timothysc @jamiehannaford |
@timothysc In order to do |
@jamiehannaford There are 2 parts.
|
I do hope to have time to work on ComponentConfig for all the remaining components over the next couple of releases. |
@ncdc @timothysc Is there a wider epic issue for the componentconfig stuff? |
@timothysc I've seen the Google doc, I meant a Github issue for tracking work across different components |
Moving milestone to v1.9. We have a rough design doc in v1.8 and are building the ground work for making HA possible in v1.9 |
can the docs linked here be made public, all of them get an access permission request form clicking through to them, is there a current design doc extant? |
found it kubernetes/enhancements#357 |
I deployed this manually today and ran into the following issue when testing master failover: the IP address specified in |
hope for this feature for a long time... |
is this feature going to make it to 1.9? |
No, not in its entirety. We have some work in progress, but due to the really tight schedule it's not gonna make alpha in v1.9. Instead we'll focus on documenting how to do HA "manually" #546. There really is a lot of work to make this happen, nobody has done this kind of HA "hands-off" installing flow for k8s yet AFAIK, so we're falling back on what everyone else does for now. |
@luxas fwiw, I think tectonic-installer (but bootkube based) is closes to the goals for kubeadm, worth having a look. |
Here is my stab at kubeadm HA on AWS: https://github.com/itskoko/kubecfn |
@discordianfish wow, that looks like a lot of work -- nice job |
@discordianfish thats really nice work, and awesome, you've worked around all the bugs :-) +1 |
Awesome job @discordianfish. If you had to do any workarounds to get kubeadm working (i.e. to fix kubeadm-specific bugs or shortcomings) would you mind opening an issue so we can document them? |
@jamiehannaford the biggest bug that i see in going through the repo, is #411, so effectively you have to rewrite the kubelet config kubeadm generates, to point it from an ip to a dns name for the masters. the biggest shortcoming being work around seems to be assuming ownership of the etcd cluster management. the rest seems cloud provider specific (lambda to maintain dns mapping as master hosts come around or go out), etc. [edit] he filed one on kubeadm issues (#609) and referenced from kubecfn repo, notionally this is also related to #411 basically the same issue of not respecting cli paramaeter advertise address as url and converting it early to ip, and then writing an ip to everything kubeadm touches for the master address. |
#411 doesn't effect kubecfn since I'm not using kubeadm on the worker because I couldn't get the token auth to play well with the multi HA setup. Instead I'm just using the admin.conf which isn't ideal an (now) tracked in itskoko/kubecfn#6 #609 is the biggest pain point right now. The workaround is ugly at least (overwriting the configmap after each kubeadm run) Another minor issue is that some paths are hardcoded in kubeadm, making it harder to pre-generate configs. For that I have to run kubeadm in a docker container. In general, for this project I would have preferred if kubeadm had some offline mode where all it does is generating the config, similar like bootkube is doing it. Everything else is etcd related which is IMO by far the hardest part to get right in a reliable fashion, even with cloudformation and the signaling. So if the overall experience of setting up a HA cluster should be improved, maybe the etcd bootstrapping process could be made easier. One way would be to ignore SAN/CN completely, which IMO should be still pretty much secure as with checking it. For that I opened etcd-io/etcd#8912 Beside all this, there are some small kubernetes issues I filled which would have saved me tons of time. Things like: |
What I found to be the biggest pain is that if I want to use keepalived as the "loadbalancer" and want to run it with keepalived in kubernetes that I first need to bootstrap the kubernetes "master cluster" with one master IP and then rewrite all configs to point to the keepalived ip. my setup is besides that really simple:
so basically I just build a kubernetes cluster with one master, create the keepalived service, register two other masters with the keepalived IP and the configmap will be rewritten if kubeadm init is called a second/third time. after that I just need to adjust all ips (kubeadm, kubelet, admin.conf, whatever) inside the first master node to point to the keepalived ip (well I actually used #546 (comment) to bring up my new masters and rewrite all configs/configmap to point to the keepalived ip). and done. basically the whole setup is self-containing and does not need any external load balancer/whatever you just need a re-routeable ip in your network. Edit: I also used ingition/cloud-config to bootstrap coreos on vmware. it's really simple. and my etcd runs over coreos and uses rkt. (will be instealled via cloud-config) I actually generated the etcd pki before creating the nodes. I can actually write a guide if somebody needs it, and provide all necessary configs.. my next try would be to use kubelet over rkt, but I'm not sure if that plays well with kubeadm. |
@discordianfish fwiw the issue with #411, i think underlies the config map issue, and also causes the write out of the ip in the master config, which the kubcfn makefile uses sed on to restore back to cluster name. its basically that early on the dns name given is over written by the resolved ip, and that gets written out every where kubeadm touches. |
Closing this original parent issue as plans have changed, we will have updated issues and pr's coming in 1.11 /cc @fabriziopandini |
@timothysc Where can I read about the new plan? |
I believe the new issue for the new plan is #751 |
Somewhat surprised kubernetes/community#1707 wasn't mentioned here when that PR was originally opened. |
The following is a checklist for kubeadm support for deploying HA-clusters. This a distillation of action items from:
https://docs.google.com/document/d/1lH9OKkFZMSqXCApmSXemEDuy9qlINdm5MfWWGrK3JYc/edit#heading=h.8hdxw3quu67g
but there may be more.
New Features:
PR: Allow kcm and scheduler to lock on ConfigMaps. kubernetes#45739
https://docs.google.com/document/d/1arP4T9Qkp2SovlJZ_y790sBeiWXDO6SG10pZ_UUU-Lc/edit?ts=59110d75#heading=h.xgjl2srtytjt
PR: Add etcd-operator to kubeadm kubernetes#45665
Contentious:
Extending Support & Documentation:
Cleanup cruft:
/cc @kubernetes/sig-cluster-lifecycle-feature-requests
The text was updated successfully, but these errors were encountered: