Implement per node Affinity & Tolerations #102

glebiller · 2022-03-10T15:49:12Z

Currently, the Affinity & Tolerations are set for all the StatefulSets uniformly.
This prevents forcing StatefulSets to a particular region or particular node.

This PR adds a new field spec.nodeSets that is mutually exclusive with spec.replicas.
This new field allows setting an Affinity or Tolerations to a particular node while keeping the a default in the spec.scheduler.

Added the required tests to cover the use cases.

NB: I am considering simplifying the logic so that the field replicas will generate a number of empty node in the nodeSets to match the value. This will allow simplifying the reconciliation.

alex-arica · 2022-03-12T10:18:49Z

@glebiller Thank you for the PR.

I will review it this weekend and I will let you know on Monday morning about it. Thanks for your contribution.

alex-arica · 2022-03-16T19:09:44Z

@glebiller apologies that I could not reply earlier.

I understand that you would like to have the option to apply specific Affinity & Tolerations rules to specific StatefulSets.

Kubegres can replace a Primary Postgres StatefulSet by a Replica StatefulSet and it can also remove a Replica Postgres StatefulSet if it is identified as unavailable which will be replaced by a new Replica StatefulSet. This failover behaviour has for result the incrementation of the last instance index when assigning new names to the newly created StatefulSets.

For example, let's say we have a cluster of Postgres with 3 StatefulSets, with the names: mypostgres-1, mypostgres-2 and mypostgres-3.

The name "[postgres clustername]-[integer] has for instance index [integer]. In mypostgres-1 the instance index is 1.

If the Primary StatefulSet mypostgres-1 is unavailable it is replaced by a Replica StatefulSet mypostgres-2 which will be promoted as a new Primary. And a new Replica StatefulSet will be created. After the failover, the cluster will contains: mypostgres-2, mypostgres-3 and mypostgres-4.

The instance index 1 does not exist once the failover is completed.

If a Replica StatefulSet is unavailable, let's say mypostgres-3, then it will be replaced by a new Replica StatefulSet. After the failover the cluster will contains: mypostgres-2, mypostgres-4 and mypostgres-5.

The instance index 3 does not exist once the failover is completed.

Correct me if I am wrong, it seems like the array spec.nodeSets is using its array index number to identify the instance index on which to apply the Affinity & Tolerations rules. If the failover use cases above happen, how do you keep track of the configuration since the instance indexes would change?

glebiller · 2022-03-17T10:35:07Z

@alex-arica No worries for the delay :)

I understood the behavior you described before, and the Failover unit test was working because it was re-creating the missing Instance instead of incrementing the instance index to create a new Statefulset.

I just commit the second part of the changes that does the following:

replace the Instance type int32 by string
replace the "index" label by "app.kubernetes.io/instance" which will track the instance name instead of the index
updated the reconcile loop to re-create StatefulSets that are missing or extra. This will not create any StatefulSets with an instance name different than the one defined in nodeSets (like the 4 and 5 you mentioned).
use the instance name to match the deployed Pod / StatefulSet to the Spec

That change should also help with #88 since it will keep the name of the StatefulSets organized, in addition to being able to change the configuration of each nodes Tolerations & Affinities separately.

alex-arica · 2022-03-18T11:51:51Z

Thank you for the update @glebiller

I will review it as soon as I can and let you know.

Use the Kubernetes recommanded label "app.kubernetes.io/instance" to store the StatefulSet instance. Replace the int32 index variable by the name of the StatefulSet instance defined in `nodeSets`.

kmiszta · 2023-07-31T11:01:26Z

Hi, will this be merged somewhere in the future?

Implement per node Affinity & Tolerations

9356602

glebiller force-pushed the node-sets branch from 903fb29 to 9356602 Compare March 10, 2022 16:23

glebiller force-pushed the node-sets branch from 11b8a2c to 1f47de0 Compare March 17, 2022 10:25

Use instance instead of index for StatefulSets.

d87a32d

Use the Kubernetes recommanded label "app.kubernetes.io/instance" to store the StatefulSet instance. Replace the int32 index variable by the name of the StatefulSet instance defined in `nodeSets`.

glebiller force-pushed the node-sets branch from 1f47de0 to d87a32d Compare March 30, 2022 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement per node Affinity & Tolerations #102

Implement per node Affinity & Tolerations #102

glebiller commented Mar 10, 2022 •

edited

Loading

alex-arica commented Mar 12, 2022

alex-arica commented Mar 16, 2022

glebiller commented Mar 17, 2022

alex-arica commented Mar 18, 2022

kmiszta commented Jul 31, 2023

Implement per node Affinity & Tolerations #102

Are you sure you want to change the base?

Implement per node Affinity & Tolerations #102

Conversation

glebiller commented Mar 10, 2022 • edited Loading

alex-arica commented Mar 12, 2022

alex-arica commented Mar 16, 2022

glebiller commented Mar 17, 2022

alex-arica commented Mar 18, 2022

kmiszta commented Jul 31, 2023

glebiller commented Mar 10, 2022 •

edited

Loading