Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce WorkflowSet and HardwareRuleSet CRDs #40

Open
jacobweinstock opened this issue Aug 5, 2024 · 3 comments
Open

Introduce WorkflowSet and HardwareRuleSet CRDs #40

jacobweinstock opened this issue Aug 5, 2024 · 3 comments
Labels
status/discussion The scope and kind of work is still in discussion

Comments

@jacobweinstock
Copy link
Member

jacobweinstock commented Aug 5, 2024

Currently, Workflows have to be created using a 1:1 mapping between Hardware and Workflow. This has been the case since the beginning. Workflow creation is left up to the user. For large deployments this can be challenging. I propose we build on top of the existing Workflow object and build the capability to have the Stack do a 1:many mapping between Hardware and Workflow. This opens up many new possibilities and even integration with auto capabilities.

The idea is that a user can define a WorkflowSet object and the Tink controller (or something else) will use the object in order to create >= 1 Workflow object(s). This significantly improves the user experience around large batch creation of Workflows.

Some of the technical details aren't fully formed yet. You'll see that in the comments below. I will update this issue as the details become more fully formed.

New CRDs

WorkflowSet

For each hardware object create a workflow object if an existing (exact match? hardware ref already exists?) workflow object does not exist. Use the pause annotation to pause creating workflow objects. Tink worker matching: The Hardware object must provide a unique identifier. the namespace/name for the Hardware object is unique but might not be usable for the tink worker id. It could be the "first" mac address. There could be a field in the Hardware object that defines the unique identifier. This identifier needs to be coordinated with the Tink worker and Smee (Smee sets the ID in kernel parameters).

---
apiVersion: tinkerbell.org/v1alpha1
kind: WorkflowSet
metadata:
  annotations:
    tinkerbell.org/pause: "false"
  name: set1
  namespace: tink
spec:
  HardwareRuleSetRefs:
    - name: ruleset1
      namespace: tink
  TemplateRef:
    name: template1
    namespace: tink
  MaxWorkflows: 5

HardwareRuleSet - CRD

the result of matching Hardware against the ruleset will be a list of Hardware objects.

---
apiVersion: tinkerbell.org/v1alpha1
kind: HardwareRuleSet
metadata:
  name: ruleset1
  namespace: tink
spec:
  operation: AND # OR
  rules:
    - label: kubernetes.io/arch
      value: amd64
      type: string # int, bool, float
      matchExpression: "=="
@jacobweinstock jacobweinstock added the status/discussion The scope and kind of work is still in discussion label Aug 5, 2024
@chrisdoherty4
Copy link
Member

Whoop. Great to see this.

Use the pause annotation to pause creating workflow objects

What motivated the annotation ahead of a field?

HardwareRuleSet

What's the rational for not embedding this as part of the WorkflowSet?

@jacobweinstock
Copy link
Member Author

Hey @chrisdoherty4

What motivated the annotation ahead of a field?

It's just what i've seen from other controllers. Some CAPI controllers, for example. I actually haven't dove into the trade offs around this much. Definitely open to other ways like a field. If you have any experience, preference, etc please do share :)

What's the rational for not embedding this as part of the WorkflowSet?

The idea is to make HardwareRuleSet's reusable across WorkflowSet's. For example, There could be a HardwareRuleSet for x86_64 machines, one for machines with a certain type of hardware, and one for machines in a specific datacenter, rack, etc. Then multiple WorkflowSet's can reuse these to target machines in different ways. That was the idea, open to the alternative of embedding as i know it would be one less CRD and less work on the backend.

@jacobweinstock
Copy link
Member Author

Another thing to possibly add here would be the ability to specify some kind of anti-affinity rules. That way if we want 5 machines and want them to all be in their own failure domains then we could. For example, a rack or datacenter anti-affinity. I'll be thinking about how this might look and about adding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/discussion The scope and kind of work is still in discussion
Projects
None yet
Development

No branches or pull requests

2 participants