This file will guide you through setting up more advanced configuration options of a PostgreSQL instance.
To use your instances outside the K8s cluster they are running in, you can expose them via a load
balancer. You can do so by setting the field spec.expose
equal to LoadBalancer
, as in this
example:
apiVersion: postgresql.anynines.com/v1beta3
kind: Postgresql
metadata:
name: sample-pg-cluster
spec:
version: 14
expose: LoadBalancer
The operator will set up a K8s ConfigMap containing the hosts and ports you can use to connect to your instance.
The name of the ConfigMap will be <instance-name>-connection
, the namespace will be the same as
the instance's. You can find it and inspect the contents by running:
kubectl get configmap <instance-name>-connection -o yaml
Note 1: Exposing the instance will always create at least one dedicated load balancer, which might cause additional cost, depending on your infrastructure.
Note 2: If your instance also has a read-only service (spec.enableReadOnlyService: true
),
exposing it outside the cluster will create two load balancers, one for the read-only service and
one for the read-write service. This might cost even more (the actual amount depends on your
infrastructure).
Note 3: Some infrastructure providers might not support load balancers, if you run a8s on one such
provider even if you specify spec.expose: LoadBalancer
the instance won't get a load balancer.
With the help of scheduling constraints you can make better use of your clusters
resource and/or make PostgreSQL instances more resilient against failures by
setting up highly available instances. In general, these settings are exposed
through the spec.schedulingConstraints
field for example in the Postgresql
objects (see API
Documentation).
Subfields of schedulingConstraints
allow you to configure
tolerations, node
affinity,
pod
(anti-)affinity
for the Pods of the Data
Service Instances (DSI). They are directly copied, unmodified, to the
corresponding fields of the DSI pods.
Thus the a8s framework fully relies on Kubernetes mechanisms and inherits its limitations, therefore it is highly recommended going through the Kubernetes documentation on the topic:
The next sections will guide you through the configuration process.
Note: Be careful when assigning scheduling constraints, this can lead to Pods never being scheduled and when modifying nodes (for example with taints) it is possible to evict all running pods !
Affinity and Anti-Affinity is used to attract or repel pods to/from K8s cluster nodes at scheduling time or runtime, based on the nodes labels or on the labels of other pods running on the nodes. The former case is called node affinity and the latter one inter-pod (anti-)affinity.
You can read more about what constraints are possible in the Kubernetes documentation, as mentioned before the a8s framework does not place any restrictions on what constraints you can apply.
We will demonstrate how to specify anti-affinity in two simple examples here, affinity can be expressed analogously.
In this section we will assume that:
- you are using a cluster that has nodes in at least 3 availability zones (AZ).
- you want to use a 3 replica PostgreSQL instance.
- the AZ of a node is indicated by the node's label
topology.kubernetes.io/zone
.
In this case, for a high available PostgreSQL, the replicas have to be distributed
among those AZs, here is an example Postgresql
CustomResource
(CR) object that will achieve this:
apiVersion: postgresql.anynines.com/v1beta3
kind: Postgresql
metadata:
name: ha-1-sample-pg-cluster
spec:
replicas: 3
version: 14
resources:
requests:
cpu: 100m
limits:
memory: 200Mi
schedulingConstraints:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: a8s.a9s/dsi-name
operator: In
values:
- ha-1-sample-pg-cluster
- key: a8s.a9s/dsi-kind
operator: In
values:
- Postgresql
topologyKey: topology.kubernetes.io/zone
Let's go through the specs in detail:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
Since we want PostgreSQL pods to repel each other, we use anti-affinity
here, the requiredDuringScheduling
part will then indicate which conditions must be met
before a pod gets scheduled.
The IgnoredDuringExecution
implies that in case we are modifying an already
running instance, pods will not get evicted to enforce this policy, so it will
only take effect when pods restart. Therefore, if you want to try out the
examples make sure to always create a new instance.
Goal of our constraints is to express, that no other pod of the same DSI should be in the same zone, which is done through:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: a8s.a9s/dsi-name
operator: In
values:
- ha-1-sample-pg-cluster
- key: a8s.a9s/dsi-kind
operator: In
values:
- Postgresql
topologyKey: topology.kubernetes.io/zone
Here we only use one podAffinityTerm
(multiple ones are possible), which
matches pods with the value ha-1-sample-pg-cluster
in the label
a8s.a9s/dsi-name
and with value Postgresql
in the label a8s.a9s/dsi-kind
.
You can find out how pods are labeled in the reference
section.
Since we are specifying an anti-affinity term, a Pod won't be scheduled on a
node in an AZ in which there are already pods that match the match expressions
(which are ANDED). The constraint works at the AZ level, because we used
topology.kubernetes.io/zone
as topologyKey
. Whereas kubernetes.io/hostname
for example would enforce that there are no other matching pods running on the
same node rather than AZ.
For other possible values see the well known labels and annotations section.
Note: Although it is best practice to use the well known labels and annotations such as
topology.kubernetes.io/zone
and most providers use them, Kubernetes does not enforce them. Thus, you will have to make sure that your cluster uses them or replace them with the ones that are used in your cluster. You can find out by asking your admin or inspecting your cluster nodes.
You can test this example using:
kubectl apply -f examples/postgresl-ha-1-instance.yaml
Then get the nodes where the DSI individual replicas are running:
kubectl get pods -l a8s.a9s/dsi-name=ha-1-sample-pg-cluster -o go-template='{{range .items}}{{printf "%s : %s\n" .metadata.name .spec.nodeName }}{{end}}'
And verify that the nodes are part of different AZs by inspecting the output of:
kubectl get nodes -o go-template='{{range .items}}{{printf "%s : %s\n" .metadata.name (index .metadata.labels "topology.kubernetes.io/zone") }}{{end}}'
Note: The documentation warns you that specifying pod constraints can result in significantly increased amount of processing at scheduling time, which can slow down your cluster.
For a more detailed and complete guide, please refer to the Kubernetes documentation, everything mentioned there is directly applicable to the a8s framework.
While the above example was easy to set up, in a production grade cluster you might want to have more DSI replicas than AZ, especially to ensure upscaling is possible without having to worry about scheduling fields.
We are now going to assume you have a cluster with 3 AZs which all contain multiple nodes and want to run a 5 replicas DSI on it. We want to ensure that pods are distributed among zones and also that there are no two Pods of the same DSI running on the same node. This can be achieved using:
apiVersion: postgresql.anynines.com/v1beta3
kind: Postgresql
metadata:
name: ha-2-sample-pg-cluster
spec:
replicas: 5
version: 14
resources:
requests:
cpu: 100m
limits:
memory: 200Mi
schedulingConstraints:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: a8s.a9s/dsi-name
operator: In
values:
- ha-2-sample-pg-cluster
- key: a8s.a9s/dsi-kind
operator: In
values:
- Postgresql
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
podAffinityTerm:
labelSelector:
matchExpressions:
- key: a8s.a9s/dsi-name
operator: In
values:
- ha-2-sample-pg-cluster
- key: a8s.a9s/dsi-kind
operator: In
values:
- Postgresql
topologyKey: topology.kubernetes.io/zone
Here we chose 5 replica DSI (instead of 3 as in Example 1). If we used the same
schedulingConstraints as the previous example, 2 of the pods would not be
scheduled (we are still assuming 3 AZs), since the required
constraint
wouldn't be satisfiable. Instead, we have now made this constraint
preferable:
- weight: 10
podAffinityTerm:
labelSelector:
matchExpressions:
- key: a8s.a9s/dsi-name
operator: In
values:
- ha-2-sample-pg-cluster
- key: a8s.a9s/dsi-kind
operator: In
values:
- Postgresql
topologyKey: topology.kubernetes.io/zone
This will not prevent scheduling of pods in the same availability zone, but will minimize the likelihood of having them in the same AZ.
Additionally, when using preferredDuringSchedulingIgnoredDuringExecution
one
has to give each constraint a weight. This weight conveys to the scheduler how
important the constraint is with respect to other constraints. This is needed
because you can specify multiple constraints and not all of them might be
satisfiable at the same time.
To prevent scheduling on the same node, we modified the required
constraint to
use the aforementioned topologyKey
for nodes, i.e. kubernetes.io/hostname
.
Now scaling up is only limited by the number of Kubernetes nodes. So in this case
you would need 5 nodes, otherwise our requiredDuringScheduling
constraint will
again prevent 2 pods from scheduling.
If you want to avoid that, you can move the constraint from required
to
preferred
and for example give it a weight of 100.
You can apply the example by using:
kubectl apply -f examples/postgresql-ha-2-instance.yaml
Then get the nodes where the DSI individual replicas are running to verify that they are all different:
kubectl get pods -l a8s.a9s/dsi-name=ha-1-sample-pg-cluster -o go-template='{{range .items}}{{printf "%s : %s\n" .metadata.name .spec.nodeName }}{{end}}'
And additionally verify that the nodes are not running in a single AZs by inspecting the output of:
kubectl get nodes -o go-template='{{range .items}}{{printf "%s : %s\n" .metadata.name (index .metadata.labels "topology.kubernetes.io/zone") }}{{end}}'
Where affinity and anti-affinity are used to specify preferences of pods to be scheduled on specific nodes, taints and tolerations can be used to prevent pods from being scheduled on a specific node.
A node will be tainted with a certain taint, causing only pods with a matching
toleration to be scheduled/executed on that node. Other pods, in case
of a NoSchedule
taint will not be scheduled on the node anymore or will even
be evicted from it in case of a NoExecute
taint. For example, a commonly used
taint is node-role.kubernetes.io/control-plane
which is typically used on the
nodes reserved for the Kubernetes control plane components.
Since taints can stop pods, be careful when tainting your nodes, you could end up leaving the cluster in a broken state. Please read the documentation before using taints and tolerations.
First we have to apply a taint to a node, here we will assign the taint
pg-node
:
kubectl taint nodes <node_name> pg-node=true:NoSchedule
Now to additionally be able to express affinity to that node, you will also have to label the node using:
kubectl label nodes <node_name> pg-node=true
After that we can specify a PostgreSQL instance with pods able to schedule on that node and which are attracted to that node:
apiVersion: postgresql.anynines.com/v1beta3
kind: Postgresql
metadata:
name: toleration-sample-pg-cluster
spec:
replicas: 3
version: 14
resources:
requests:
cpu: 100m
limits:
memory: 200Mi
schedulingConstraints:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: pg-node
operator: In
values:
- "true"
tolerations:
- key: "pg-node"
operator: "Equal"
value: "true"
effect: NoSchedule
The toleration for our taint is specified in
tolerations:
- key: "pg-node"
operator: "Equal"
value: "true"
effect: NoSchedule
and by using the nodeAffinity
term:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: pg-node
operator: In
values:
- "true"
We ensure that the pods can only schedule on our tainted and labeled node.
You can apply an instance with that spec by using:
kubectl apply -f examples/postgresql-toleration-instance.yaml
If you now add another DSI, for example our sample-pg-cluster
from the usage
overview, by using:
kubectl apply -f examples/postgresql-instance.yaml
You will see that none of its replicas will be scheduled on the tainted node.
This section will point out some of the current limitations and caveats you might be experiencing when working with scheduling constraints, the indicated Kubernetes documentation pages will provide a more complete overview.
- Using scheduling constraints can evict pods and cause some pods to not be
scheduled regardless of resources available on the cluster. This is especially
true when tainting, all pods will either be evicted from a tainted node or
will not be able to reschedule there. Also, adding
requiredDuringScheduling
constraints can prevent scheduling, so be especially careful when using them. - Kubernetes uses some well known taints defined here, those should not be used outside their described use case. Otherwise, other workloads or constraints that depend on them might show an unexpected behavior.
- For DSIs that are backed by a StatefulSet (e.g. PostgreSQL), updating scheduling constraints from a value that prevents scheduling to a satisfiable value won't have an effect. This is due to the StatefulSet controller not being able to update the schedulingConstraints while waiting for pods to schedule (see Issue). In this case delete the instance first, before reapplying the valid manifest.
- The Kubernetes scheduler does not only take into account your specified
constraints, but also for example resources available on a node. This can in
some cases, overrule some of your
prefered
affinity constraints or cause some pods with required constraints to be stuck in pending without being scheduled for a long time. - Specifying scheduling constraints and in particular
podAffinity
will increase the processing needs of the scheduler, possibly slowing down your cluster. - For some storage classes, the PersistentVolumeClaims can cause the pod to stick to a specific node. For example, if a pod was already scheduled on the node and after a change in scheduling constraints it can no longer run on it the pod can get stuck in pending. The reason is that the PersistentVolumeClaim of the pod is bound to the node and therefore this node becomes the only eligible node for scheduling the pod, but the constraints forbid scheduling. This behavior will be addressed in future releases of Kubernetes.