When a node failed and cannot be restarted, delete it completely and recreate it by running Terraform:
terraform apply
This works for both worker and controller nodes. Exactly the same configuration and Lokomotive revision should be used because otherwise not only the deleted node will be recreated but others might be as well.
However, when more than half of the controller nodes failed, etcd lost its quorum and is in read-only mode. Recreated nodes are not able to join as new members anymore. Read up how to manually restore the etcd cluster starting a new etcd cluster from a single etcd member with backup data. The alternative is creating an empty new Kubernetes cluster and migrating your applications to the new cluster.
- Run multiple Kubernetes clusters. Run across platforms. Plan for regional and cloud outages.
- Require applications be platform agnostic. Moving an application between a Kubernetes AWS cluster and a Kubernetes bare-metal cluster should be normal.
- Strive to make single-cluster outages tolerable. Practice performing failovers.
- Strive to make single-cluster outages a non-event. Load balance applications between multiple clusters, automate failover behaviors, and adjust alerting behaviors.
Lokomotive provides tagged releases to allow clusters to be versioned using ordinary Terraform configs.
module "google-cloud-yavin" {
source = "git::https://github.com/kinvolk/lokomotive-kubernetes//google-cloud/flatcar-linux/kubernetes?ref=<hash>"
...
}
module "bare-metal-mercury" {
source = "git::https://github.com/kinvolk/lokomotive-kubernetes//bare-metal/flatcar-linux/kubernetes?ref=<hash>"
...
}
Master is updated regularly, so it is recommended to pin modules to a release tag or commit hash. Pinning ensures terraform get --update
only fetches the desired version.
Lokomotive recommends upgrading clusters using a blue-green replacement strategy and migrating workloads.
- Launch new (candidate) clusters from tagged releases
- Apply workloads from existing cluster(s)
- Evaluate application health and performance
- Migrate application traffic to the new cluster
- Compare metrics and delete old cluster when ready
Blue-green replacement reduces risk for clusters running critical applications. Candidate clusters allow baseline properties of clusters to be assessed (e.g. pod-to-pod bandwidth). Applying application workloads allows health to be assessed before being subjected to traffic (e.g. detect any changes in Kubernetes behavior between versions). Migration to the new cluster can be controlled according to requirements. Migration may mean updating DNS records to resolve the new cluster's ingress or may involve a load balancer gradually shifting traffic to the new cluster "backend". Retain the old cluster for a time to compare metrics or for fallback if issues arise.
Blue-green replacement provides some subtler benefits as well:
- Encourages investment in tooling for traffic migration and failovers. When a cluster incident arises, shifting applications to a healthy cluster will be second nature.
- Discourages reliance on in-place opaque state. Retain confidence in your ability to create infrastructure from scratch.
- Allows Lokomotive to make architecture changes between releases and eases the burden on Lokomotive maintainers. By contrast, distros promising in-place upgrades get stuck with their mistakes or require complex and error-prone migrations.
Lokomotive bare-metal clusters are provisioned by a PXE-enabled network boot environment and a Matchbox service. To upgrade, re-provision machines into a new cluster.
Failover application workloads to another cluster (varies).
kubectl config use-context other-context
kubectl apply -f mercury -R
# DNS or load balancer changes
Power off bare-metal machines and set their next boot device to PXE.
ipmitool -H node1.example.com -U USER -P PASS power off
ipmitool -H node1.example.com -U USER -P PASS chassis bootdev pxe
Delete or comment the Terraform config for the cluster.
- module "bare-metal-mercury" {
- source = "git::https://github.com/kinvolk/lokomotive-kubernetes//bare-metal/flatcar-linux/kubernetes"
- ...
-}
Apply to delete old provisioning configs from Matchbox.
$ terraform apply
Apply complete! Resources: 0 added, 0 changed, 55 destroyed.
Re-provision a new cluster by following the bare-metal tutorial.
Create a new cluster following the tutorials. Failover application workloads to the new cluster (varies).
kubectl config use-context other-context
kubectl apply -f mercury -R
# DNS or load balancer changes
Once you're confident in the new cluster, delete the Terraform config for the old cluster.
- module "google-cloud-yavin" {
- source = "git::https://github.com/kinvolk/lokomotive-kubernetes//google-cloud/flatcar-linux/kubernetes"
- ...
-}
Apply to delete the cluster.
$ terraform apply
Apply complete! Resources: 0 added, 0 changed, 55 destroyed.
Lokomotive uses a self-hosted Kubernetes control plane which allows certain manifest upgrades to be performed in-place. Components like apiserver
, controller-manager
, scheduler
, flannel
/calico
, coredns
, and kube-proxy
are run on Kubernetes itself and can be edited via kubectl
. If you're interested, see the bootkube upgrade docs.
In certain scenarios, in-place edits can be useful for quickly rolling out security patches (e.g. bumping coredns
) or prioritizing speed over the safety of a proper cluster re-provision and transition.
!!! note Rarely, we may test certain security in-place edits and mention them as an option in release notes.
!!! warning Lokomotive does not support or document in-place edits as an upgrade strategy. They involve inherent risks and we choose not to make recommendations or guarantees about the safety of different in-place upgrades. Its explicitly a non-goal.
Use the Terraform 3rd-party plugin directory ~/.terraform.d/plugins
to keep versioned copies of the terraform-provider-ct
and terraform-provider-matchbox
plugins. The plugin directory replaces the ~/.terraformrc
file to allow 3rd party plugins to be defined and versioned independently (rather than globally).
# ~/.terraformrc (DEPRECATED)
providers {
ct = "/usr/local/bin/terraform-provider-ct"
matchbox = "/usr/local/bin/terraform-provider-matchbox"
}
Migrate to using the Terraform plugin directory. Move ~/.terraformrc
to a backup location.
mv ~/.terraformrc ~/.terraform-backup
Add the terraform-provider-ct plugin binary for your system to ~/.terraform.d/plugins/
. Download the same version of terraform-provider-ct
you were using with ~/.terraformrc
, updating only be done as a followup and is only safe for v1.12.2+ clusters!
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
If you use bare-metal, add the terraform-provider-matchbox plugin binary for your system to ~/.terraform.d/plugins/
, noting the versioned name.
wget https://github.com/poseidon/terraform-provider-matchbox/releases/download/v0.2.3/terraform-provider-matchbox-v0.2.3-linux-amd64.tar.gz
tar xzf terraform-provider-matchbox-v0.2.3-linux-amd64.tar.gz
mv terraform-provider-matchbox-v0.2.3-linux-amd64/terraform-provider-matchbox ~/.terraform.d/plugins/terraform-provider-matchbox_v0.2.3
Binary names are versioned. This enables the ability to upgrade different plugins and have clusters pin different versions.
$ tree ~/.terraform.d/
/home/user/.terraform.d/
└── plugins
├── terraform-provider-ct_v0.2.1
└── terraform-provider-matchbox_v0.2.3
In each Terraform working directory, set the version of each provider.
# providers.tf
provider "matchbox" {
version = "0.2.3"
...
}
provider "ct" {
version = "0.4.0"
}
Run terraform init
to ensure plugin version requirements are met. Verify terraform plan
does not produce a diff, since the plugin versions should be the same as previously.
$ terraform init
$ terraform plan
The terraform-provider-ct plugin parses, validates, and converts Container Linux Configs into Ignition user-data for provisioning instances. The plugin can be updated in-place and on apply, only workers will be replaced.
First, migrate to the Terraform 3rd-party plugin directory to allow 3rd-party plugins to be defined and versioned independently (rather than globally).
Add the terraform-provider-ct plugin binary for your system to ~/.terraform.d/plugins/
, noting the final name.
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.3.1/terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
mv terraform-provider-ct-v0.3.1-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.1
Binary names are versioned. This enables the ability to upgrade different plugins and have clusters pin different versions.
$ tree ~/.terraform.d/
/home/user/.terraform.d/
└── plugins
├── terraform-provider-ct_v0.2.1
├── terraform-provider-ct_v0.3.0
├── terraform-provider-ct_v0.3.1
└── terraform-provider-matchbox_v0.2.3
Update the version of the ct
plugin in each Terraform working directory.
# providers.tf
provider "ct" {
version = "0.4.0"
}
Run init and plan to check that no diff is proposed for the controller nodes (a diff would destroy cluster state).
terraform init
terraform plan
Apply the change. Worker nodes' user-data will be changed and workers will be replaced. Rollout happens slightly differently on each platform:
AWS creates a new worker ASG, then removes the old ASG. New workers join the cluster and old workers disappear. terraform apply
will hang during this process.
Azure edits the worker scale set in-place instantly. Manually terminate workers to create replacement workers using the new user-data.
No action is needed. Bare-Metal machines do not re-PXE unless explicitly made to do so.