Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support leader being in its own subgroup #257

Open
2 of 3 tasks
avrittrohwer opened this issue Nov 13, 2024 · 3 comments
Open
2 of 3 tasks

support leader being in its own subgroup #257

avrittrohwer opened this issue Nov 13, 2024 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@avrittrohwer
Copy link

avrittrohwer commented Nov 13, 2024

What would you like to be added:

Add the ability for the leader Pod to be in its own affinity group when using the subgroup feature. For example, when deploying a leader Pod that should be scheduled on a CPU-only VM and worker Pods that should be scheduled on multiple TPU slices:

apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: my-lws
  annotations:
    leaderworkerset.sigs.k8s.io/subgroup-exclusive-topology: cloud.google.com/gke-nodepool
spec:
  replicas: 1
  leaderWorkerTemplate:
    subGroupPolicy:
      subGroupSize: 2
    size: 5
    leaderTemplate:
      spec:
        nodeSelector:
          cloud.google.com/machine-familty: n2
          node.kubernetes.io/instance-type: n2-standard-8
        containers:
        - name: leader
        ...
    workerTemplate:
      spec:
        nodeSelector:
          cloud.google.com/gke-tpu-accelerator: tpu-v5p-slice
          cloud.google.com/gke-tpu-topology: 2x2x2
        containers:
        - name: worker
          ...
          resources:
            limits:
              google.com/tpu: "4"

Currently the leader Pod is put in subgroup 0 which causes it to have the same affinity key as the workers in subgroup 0: https://github.com/kubernetes-sigs/lws/blob/main/pkg/webhooks/pod_webhook.go#L132. This causes the leader Pod in my example to be unscheduable because of the CPU instance type node selectors.

Why is this needed:

To support deploying leader-worker architectures where the leader should be scheduled in separate topologies from the worker groups.

Completion requirements:

An option in subGroupPolicy that causes the leader to have its own affinity key.

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@avrittrohwer avrittrohwer added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 13, 2024
@avrittrohwer
Copy link
Author

@ahg-g

@ahg-g
Copy link
Contributor

ahg-g commented Nov 14, 2024

@Edwinhr716

@Edwinhr716
Copy link
Contributor

Edwinhr716 commented Nov 26, 2024

From a high level view, can add a new annotation that is set if we want a leaderOnly subgroup. If set, then

subGroupUniqueKey := genGroupUniqueKey(pod.Name, "0")
pod.Labels[leaderworkerset.SubGroupUniqueHashLabelKey] = subGroupUniqueKey 

in the leader, and

subGroupUniqueKey := genGroupUniqueKey(pod.Name, "1")
pod.Labels[leaderworkerset.SubGroupUniqueHashLabelKey] = subGroupUniqueKey 

in the workers.

Just have to watch out for the TPU env variable injection portion, not setting SubGroupSize at all should work since that's how we determine how we add them https://github.com/kubernetes-sigs/lws/blob/main/pkg/utils/accelerators/tpu.go#L171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants