Unique node selector and toleration per replica #223

ahg-g · 2024-09-14T23:18:33Z

What would you like to be added:

Allow injecting a unique nodeSelector and toleration for each LWS replica to trigger cluster autoscaler to create a dedicated placement group for each replica.

In the api, the user sets the key they would like to use, and the value would be the name of the replica (the leader pod name)

ReplicaUniqueNodeSelector: 
   - compact-placement-group

The result is a nodeSelector injected as follows:
compact-placement-group: <lws-leader-name>

Similarly for tolerations:

ReplicaUniqueToleration
      - key: compact-placement-group
        effect: NoSchedule

The result is a toleration injected on the pods of a group as follows:

      - key: group
        operator: Equal
        value: <lws-leader-name>
        effect: NoSchedule

Why is this needed:
To force cluster autoscaler to create a node group per replica, which can be necessary to create compactly placed nodes (on the same rack for example) for better network performance, and can improve multi-host GPU inference.

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

The text was updated successfully, but these errors were encountered:

googs1025 · 2024-09-15T02:42:58Z

I'm willing to give it a try.：）
/assign

Would it be better to provide a Google Docs first?

googs1025 · 2024-09-17T08:27:32Z

Sorry, I don't quite understand how compact-placement-group is defined.
Does compact-placement-group mean the name of a leader pod or a user-defined field name?

apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: leaderworkerset-multi-template
spec:
  replicas: 3
  leaderWorkerTemplate:
    ReplicaSpecificNodeSelector: compact-placement-group
    ReplicaSpecificToleration:
      - key: compact-placement-group
    leaderTemplate:
      spec:
        containers:
          - name: nginx2
...

ahg-g · 2024-09-18T18:21:58Z

compact-placement-group is a string that the user sets, and we use it as the key to a nodeSelector with a value equal to the leader pod name.

so the snippet you have for the api is correct, the outcome is that we inject a nodeSelector for each group as follows:

nodeSelector:
 - compact-placement-group: <leader-pod-name>

googs1025 · 2024-10-22T08:10:58Z

Thanks for the explanation.
If having time, please help me check if this is what we want. @ahg-g

Update:

Currently I envision injecting the values of these fields in sts.
As shown in the following example:
yaml:

apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: leaderworkerset-multi-template
spec:
  replicas: 3
  leaderWorkerTemplate:
    replicaSpecificNodeSelector:
      - test
    replicaSpecificToleration:
      - key: test
        effect: NoSchedule
    leaderTemplate:
      spec:
        containers:
        - name: nginx2
          image: nginx:1.14.2
          resources:
            limits:
              cpu: "100m"
            requests:
              cpu: "50m"
          ports:
          - containerPort: 8080
    size: 4
    workerTemplate:
      spec:
        containers:
        - name: nginx
          image: nginx:1.14.2
          resources:
            limits:
              cpu: "100m"
            requests:
              cpu: "50m"
          ports:
          - containerPort: 8080

Node-Selectors and Tolerations fields will be injected into sts.

root@VM-0-16-ubuntu:/home/ubuntu# kubectl get sts
NAME                               READY   AGE
leaderworkerset-multi-template     3/3     145m
leaderworkerset-multi-template-0   3/3     145m
leaderworkerset-multi-template-1   3/3     145m
leaderworkerset-multi-template-2   0/3     145m
root@VM-0-16-ubuntu:/home/ubuntu# kubectl describe sts leaderworkerset-multi-template-0
Name:               leaderworkerset-multi-template-0
Namespace:          default
CreationTimestamp:  Tue, 22 Oct 2024 13:33:17 +0800
Selector:           leaderworkerset.sigs.k8s.io/group-index=0,leaderworkerset.sigs.k8s.io/group-key=689ce1b52864f5b6433d403de39845ba1ab94b07,leaderworkerset.sigs.k8s.io/name=leaderworkerset-multi-template
Labels:             leaderworkerset.sigs.k8s.io/group-index=0
                    leaderworkerset.sigs.k8s.io/group-key=689ce1b52864f5b6433d403de39845ba1ab94b07
                    leaderworkerset.sigs.k8s.io/name=leaderworkerset-multi-template
                    leaderworkerset.sigs.k8s.io/template-revision-hash=0f4e30acf40aef19cbbc2456d8652c7fc6d62705
Annotations:        <none>
Replicas:           3 desired | 3 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       leaderworkerset.sigs.k8s.io/group-index=0
                leaderworkerset.sigs.k8s.io/group-key=689ce1b52864f5b6433d403de39845ba1ab94b07
                leaderworkerset.sigs.k8s.io/name=leaderworkerset-multi-template
                leaderworkerset.sigs.k8s.io/template-revision-hash=0f4e30acf40aef19cbbc2456d8652c7fc6d62705
  Annotations:  leaderworkerset.sigs.k8s.io/leader-name: leaderworkerset-multi-template-0
                leaderworkerset.sigs.k8s.io/size: 4
  Containers:
   nginx:
    Image:      nginx:1.14.2
    Port:       8080/TCP
    Host Port:  0/TCP
    Limits:
      cpu:  100m
    Requests:
      cpu:         50m
    Environment:   <none>
    Mounts:        <none>
  Volumes:         <none>
  Node-Selectors:  test=leaderworkerset-multi-template-0
  Tolerations:     test=leaderworkerset-multi-template-0:NoSchedule
Volume Claims:     <none>
Events:            <none>

When there is no label compact-placement-group: <leader-pod-name> on the node, the pod on sts will not be scheduled.

root@VM-0-16-ubuntu:/home/ubuntu# kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
leaderworkerset-multi-template-0     1/1     Running   0          152m
leaderworkerset-multi-template-0-1   1/1     Running   0          152m
leaderworkerset-multi-template-0-2   1/1     Running   0          152m
leaderworkerset-multi-template-0-3   1/1     Running   0          152m
leaderworkerset-multi-template-1     1/1     Running   0          152m
leaderworkerset-multi-template-1-1   1/1     Running   0          152m
leaderworkerset-multi-template-1-2   1/1     Running   0          152m
leaderworkerset-multi-template-1-3   1/1     Running   0          152m
leaderworkerset-multi-template-2     1/1     Running   0          152m
leaderworkerset-multi-template-2-1   0/1     Pending   0          152m
leaderworkerset-multi-template-2-2   0/1     Pending   0          152m
leaderworkerset-multi-template-2-3   0/1     Pending   0          152m

ahg-g added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 14, 2024

k8s-ci-robot assigned googs1025 Sep 15, 2024

liurupeng mentioned this issue Sep 16, 2024

Release v0.5.0 requirements #221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique node selector and toleration per replica #223

Unique node selector and toleration per replica #223

ahg-g commented Sep 14, 2024 •

edited

Loading

googs1025 commented Sep 15, 2024

googs1025 commented Sep 17, 2024 •

edited

Loading

ahg-g commented Sep 18, 2024

googs1025 commented Oct 22, 2024

Unique node selector and toleration per replica #223

Unique node selector and toleration per replica #223

Comments

ahg-g commented Sep 14, 2024 • edited Loading

googs1025 commented Sep 15, 2024

googs1025 commented Sep 17, 2024 • edited Loading

ahg-g commented Sep 18, 2024

googs1025 commented Oct 22, 2024

ahg-g commented Sep 14, 2024 •

edited

Loading

googs1025 commented Sep 17, 2024 •

edited

Loading