design: tinkerbell v1alpha2 API #27

chrisdoherty4 · 2023-02-21T16:12:15Z

Implements #14.

Signed-off-by: Chris Doherty <[email protected]>

moadqassem · 2023-02-22T12:24:58Z

design/14_tinkerbell_crd_refactor/proposal.md

+**Non-goals**
+
+- To change the existing relationship between Tink and Rufio.
+- To introduce additional object status data typically found on the `.Status` field.


Is this coming however in a different proposal of how to handle status of the CRs?

Would you mind elaborating on 'how to handle status'? This proposal outlines the status aspects that are relevant to the existing Tinkerbell capability but not any additional status tracking.

Perhaps this non-goal is unclear or unnecessary.

I guess this was my bad as I didn't elaborate more on what I meant 😅. What I meant is, naturally, each k8s has a status tied to it to specify what is the status of that object and how it was reconciled during it is lifecycle. For the Hardware CRD, there was no Status field found however for the workflow there is one. So I wasn't sure adding a status field for the Hardware CRD will be added in a further work, or it is not added at all.

Even though one might thing that we don't need a Status filed/object for Hardware(due to it's functionality) I guess having a status is still valid, so we can track something like last updated, error messages reconciling, conditions, phases, etc..

If I'm not mistaken, we can add it with relative ease in the future for the Hardware type?

I think we've struggled for reliable status information Hardware hence why it doesn't exist here. This stems from the fact the physical hardware isn't really inspectable until we get the OSIE onto it at which point we have a client we can play with. It might be useful to have status information about reaching that point in the future. Again, I suspect we can add the status field when that time comes though.

Is there a good reason to why it couldn't be added later?

TL;DR: Yeah it shouldn't be any problem

Actually, one important thing about Hardware status is, integration with other tooling like Netbox/Nautobot ;) (but that's a tale for another day 😂 ). Regarding this:

I think we've struggled for reliable status information Hardware hence why it doesn't exist here. This stems from the fact the physical hardware isn't really inspectable until we get the OSIE onto it at which point we have a client we can play with

In general that's true, however, the status isn't only about the underlying resource(in this case the server) it also to add some details about how the object itself has been and being reconciled. So in the Status object you might have something called conditions and those would reflect what is happening to the underlying object, But if the object is being reconciled by different controllers, the status object can give some insights about what happened. Example: Someone added a wrong MAC address or something was wrong during initialization of that hardware, Status object should give insights why this Hardware wasn't reconciled correctly.

Yes, absolutely. Principally, I like the thought of having some sort of hardware status information and it could prove useful for everything up-to launching the OSIE although that's difficult to do in control loops. There isn't a Hardware reconciler today though, and this proposal won't introduce one (perhaps something to call out explicitly).

moadqassem · 2023-02-22T12:27:14Z

design/14_tinkerbell_crd_refactor/proposal.md

+
+```go
+// Hardware is a logical representation of a machine that can execute Workflows.
+type Hardware struct {


Are those representation only for the sake proposal? meaning, would the API looks exactly the same as this struct? The reason why I am asking is, I believe it should be really important to comply with k8s semantics and standards. An example for that is, naming, compositions, pointers, etc...

Structurally, I expect these data structures to map exactly to code. Could you elaborate/link to the standards you're referencing?

Yeah sure, for example the metav1.TypeMeta and metav1.ObjectMeta for all CRDs. Spec declaration should be like:

Spec HardwareSpec `json:"spec,omitempty"`

More of these conventions can be found here: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md

I see. I was wrong when I said "map exactly to code" then because I intended on adding that. I'll add it to the proposal for clarity.

design/14_tinkerbell_crd_refactor/proposal.md

moadqassem · 2023-02-22T12:34:11Z

design/14_tinkerbell_crd_refactor/proposal.md

+type NetworkInterfaces map[MAC]NetworkInterface
+
+// MAC is a Media Access Control address.
+type MAC string


Why this field and the other fields below(StorageDevice, KernelParam) should be a distinct types?

A few reasons.

So we can use CEL type validation.

To be explicit around expectations. Data structures such as map[string]string are difficult to establish expectations with, which has been a contributor to poor user experience.

moadqassem · 2023-02-22T12:52:28Z

design/14_tinkerbell_crd_refactor/proposal.md

+//   shared_volume:/data
+//
+// See https://docs.docker.com/storage/volumes/ for additional details
+type Volume string


For Volume I would assume that file system based volumes is not enough especially in k8s context. What about have a new volume of a type struct that we have in there different volume targets? e.g: volume from a file path, configmap, secret or source url.

The volume doesn't relate to Kubernetes volumes; it pertains to Actions run on the OSIE.

I see, so this should be just reflected just to the docker container and should only be running on the node(OISE) directly.

moadqassem · 2023-02-22T12:53:24Z

design/14_tinkerbell_crd_refactor/proposal.md

+type Volume string
+
+// ActionName is unique name within the context of a workflow.
+type ActionName string


Does this really need to be a type? Wouldn't string representation is enough where it is used?

It exists to facilitate the key in the RenderedActions field of the WorkflowStatus. It helps to establish a clear relationship between the 2 although that's arguably an implementation detail of WorkflowStatus.

I'm leaning toward removing it, what do you think?

Understood. I mean, documenting the field RenderedActions and mention that, the key represents the action name should be enough IMO, thus we can remove it.

moadqassem · 2023-02-22T12:56:53Z

design/14_tinkerbell_crd_refactor/proposal.md

+type RenderedActions map[ActionName]ActionStatus
+
+// ActionStatus describes status information about an action.
+type ActionStatus struct {


I would strongly recommend adding the conditions framework here instead of using explicit fields like State. Something similar to this https://github.com/kubernetes-sigs/cluster-api/blob/main/api/v1beta1/condition_types.go

I flip-flopped on whether to use conditions. Here's my train of thought:

Conditions are meant to be observations, not state machines in and of themselves. I think this is a common misinterpretation and misuse. Cluster API has a good practical example with its ControlPlaneReady condition. ControlPlaneReady transition to True when the first control plane node is up, but it can also transition back to False (at least that's my understanding); it is an observation of the control plane, not a transition of 1 predefined state to another. Conditions can be monotonic but, again, that doesn't mean they should represent state machines. The guidance is that they shouldn't represent state machines.

Adding conditions to actions creates a set of nested conditions because the Workflow would have conditions which are a summation of ActionStatus conditions. I don't think I'm totally opposed to this approach, but I fear its confusing given how established conditions are in Kubernetes and the fact they generally refer to the Kind, not nested data structures of a Kind. We could add conditions to the WorkflowStatus and derive them from ActionStatus information; that feels more appropriate given all other guidance but they would be complimentary to state machine fields as opposed to the actual representation of the state machine.

Conditions should generally be derivable from information already available on the resource. That is by no means a hard and fast rule but encoding state into conditions that need to be populated at reconcile time means state expectations are hard to rationalize when looking at the API only (documentation may clear this up, I haven't given it much thought). For example, we can clearly see the predefined states via the State type; conditions mask that and users are dependent on documentation to understand workflow state semantics.

Conditions have a predefined recommended format which doesn't necessarily compliment what we're doing. We could break from this in the same way Cluster API has (they dropped the ObservedGeneration field) and that's probably fine but it contributes to the turmoil already shrouding conditions. We would need to very clearly, as Cluster API has, define our condition expectations.

What does it look like if we used conditions? I would probably forgo most of the 'states' we have in favor of the following condition types.

Started: A monotonic condition indicating the workflow has begun.

Succeeded: A monotonic condition indicating the workflow succeeded. Unknown indicates it hasn't finished and False indicates it failed. On False I would expect a Reason and Message elaborating on why it didn't succeed.

I'm not sure this sufficiently represents what we want.

I think this might be useful to document in the proposal's Rationale section, whatever we decide.

These are quite interesting thoughts I would say. Ok maybe we can move the usage of conditions in another PR/Proposal? One thing to keep in mind though is, we should distinguish what things are relevant to the CR itself and it is being reconciled and how the underlying resource(machine) has eventually turned out to be.

I have to say though that, I have seen multiple cases, where a status of an object is being derived from another status assigned of another object. I mean for now I guess we can roll without it, and then we can iterate in the future about what's needs to be done here :)

I've added the detail to the proposals rationale section and said we will tackle conditions as a separate proposal.

@moadqassem I thought some more on conditions, here's what I've come up with.

We'll have the Reason and Message fields on ActionStatus as detailed in the proposal already (probably change Reason to a string so users know its something they can define.

As part of this proposal, we'll define how action developers can propagate a Reason and Message from within actions such that it populates the ActionStatus fields.

We'll remove Reason and Message from WorkflowStatus and replace it with a conditions data structure (the conditions will include a Reason and Message).

We'll define a WorkflowSucceeded condition type. The values of Reason and Message will be propagated from the failed ActonStatus providing the summarized observation of the workflow.

This does mean we'll need to define conditions as part of this proposal. Given we know we want to provide ways for users to propagate information from within actions on error, and that we want to trend toward conditions anyway, it feels appropriate to just go ahead and define our conditions setup as part of this proposal.

Using conditions on the WorkflowStatus to summarize action/workflow state feels more extensible and avoids duplicating the information field for field between ActionStatus and WorkflowStatus.

Conditions will likely take the same form as Cluster APIs.

The TL;DR on Reason and Message propagation from within actions: developers can write to paths such as /tinkernell/reason and /tinkerbell/message and which we'll read and propagate from client to server. This is at least the initial proposal and inspired by Kubernetes mechanism for containers/pods to indicate why they exited non-zero.

There are probably other observations such as WorkflowStarted that might be useful to have as conditions.

Side note, I'm also proposing the Reason type be dropped in favor of a string. This compliments better the idea that action developers can define its value.

I'm going to update the proposal with this change either way because it doesn't seem to sit right currently.

After discussing with Jacob, I decided to shoot for this ahead of tomorrow's community call. The proposal has been updated.

moadqassem · 2023-02-22T12:59:59Z

design/14_tinkerbell_crd_refactor/proposal.md

+
+### Webhooks
+
+We will introduce a set of basic webhooks for validating each CRD. The webhooks will only validate the `v1alpha2` API version. They will be implemented as part of the `tink-controller` component.


I know this might be controversial 😅 but can we please forget about webhooks and use CEL instead? Webhooks have been a huge pain to work with and debug and I would say this is a nice opportunity to integrate cel instead since this is still the beginning of the project. For more details about CEL PTAL at this https://kubernetes.io/blog/2022/09/23/crd-validation-rules-beta/

We will use CEL (I know its not clear in the proposal and I'd be happy to add specific annotations to clarify) but it doesn't capture all validation. For example, CEL won't be able to verify all Hardware submitted have unique MAC fields.

chrisdoherty4 · 2023-02-22T14:55:44Z

design/14_tinkerbell_crd_refactor/proposal.md

+```go
+// Hardware is a logical representation of a machine that can execute Workflows.
+type Hardware struct {
+	HardwareSpec


There is nothing in Kubernetes guidlines that says we must have a Spec or Status field. I'm wondering if we should just move all the HardwareSpec fields up to the Hardware object.

Another thread is talking about the possibility of adding a Status field which, if that happened, would necessitate the HardwareSpec but I don't think we have any intention of adding hardware status information any time soon so I suspect this would be fine.

I am not quite sure that there is something written about any CRD should have a field called Spec, or Status, however, if you take a look at k8s crds you will find that, they do have such fields which makes other k8s users familiar with our CRD and know where to look whenever they need to adjust something.

With that being said, we can explore this path in the future, and see how those fields can be used during the lifecycle of those CRs.

Signed-off-by: Chris Doherty <[email protected]>

moadqassem · 2023-02-26T18:31:07Z

@jacobweinstock not sure if someone else need to take a look at this to approve 😅 also not sure if I can approve/merge things

Signed-off-by: Chris Doherty <[email protected]>

jacobweinstock · 2023-02-27T17:00:52Z

design/20230222_tinkerbell_crd_refactor.md

+
+## Migrating from v1alpha1 to v1alpha2
+
+Given the relative immaturity of the Tinkerbell API we will not provide any migration tooling from `v1alpha1` to `v1alpha2`. Users will be required to manually convert `Hardware` and `Template` resources. `Workflow` resources are considered ephemeral and can generally be deleted.


I think we should write some very rudimentary script or such to help with this process. I would recommend this for only for hardware and templates, not workflows.

Sounds good. That should be simple enough.

Should be resolved.

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 changed the title ~~Add tinkerbell CRD refactor design~~ Tinkerbell CRD Refactor Design Feb 21, 2023

chrisdoherty4 marked this pull request as draft February 21, 2023 17:18

chrisdoherty4 changed the title ~~Tinkerbell CRD Refactor Design~~ design: Tinkerbell CRD Refactor Feb 21, 2023

chrisdoherty4 force-pushed the feat/crd-refactor branch 14 times, most recently from 9e067f7 to 4307489 Compare February 21, 2023 23:51

chrisdoherty4 marked this pull request as ready for review February 21, 2023 23:51

chrisdoherty4 force-pushed the feat/crd-refactor branch 5 times, most recently from 663d6b2 to 08d6e52 Compare February 22, 2023 00:07

Add tinkerbell CRD refactor design

7ba306c

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from 08d6e52 to 7ba306c Compare February 22, 2023 00:08

moadqassem reviewed Feb 22, 2023

View reviewed changes

chrisdoherty4 commented Feb 22, 2023

View reviewed changes

Update CRDs

e82aaef

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch 2 times, most recently from 5d64826 to 7ff2ad4 Compare February 24, 2023 23:17

Add rationale for not using conditions

84837a2

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from 7ff2ad4 to 84837a2 Compare February 24, 2023 23:21

Tweak dir layout and add action states

9c9729e

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from 742331b to 9c9729e Compare February 27, 2023 14:00

Change action map to list so we don't loose action order

6de3e8b

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from 2857933 to 6de3e8b Compare February 27, 2023 14:28

Update ToC

66e3a12

Signed-off-by: Chris Doherty <[email protected]>

jacobweinstock reviewed Feb 27, 2023

View reviewed changes

chrisdoherty4 force-pushed the feat/crd-refactor branch from 883c71c to ab2fbe4 Compare February 27, 2023 23:11

Update to use conditions

4bd09ed

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from ab2fbe4 to 4bd09ed Compare February 27, 2023 23:13

chrisdoherty4 added 4 commits February 28, 2023 09:07

Add LastUpdated filelds and update MAC validation

61ecf53

Signed-off-by: Chris Doherty <[email protected]>

Expand condition descriptions

68cff61

Signed-off-by: Chris Doherty <[email protected]>

Add printcolumn markers and List objects

bb91b1f

Signed-off-by: Chris Doherty <[email protected]>

Update ToC

851df5a

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from dbe6942 to 02421bb Compare March 7, 2023 19:13

chrisdoherty4 added 3 commits March 7, 2023 13:16

Fix object references

05249ac

Signed-off-by: Chris Doherty <[email protected]>

Simplify OSIE definition

41f6aca

Signed-off-by: Chris Doherty <[email protected]>

Update time fields and validation

cb99427

Signed-off-by: Chris Doherty <[email protected]>

chrisdoherty4 force-pushed the feat/crd-refactor branch from 02421bb to cb99427 Compare March 7, 2023 19:16

chrisdoherty4 changed the title ~~design: Tinkerbell CRD Refactor~~ design: tinkerbell crd refactor Mar 10, 2023

jacobweinstock approved these changes Mar 14, 2023

View reviewed changes

chrisdoherty4 merged commit 1e6c92a into tinkerbell:main Mar 15, 2023

chrisdoherty4 changed the title ~~design: tinkerbell crd refactor~~ design: tinkerbell v1alpha2 API Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design: tinkerbell v1alpha2 API #27

design: tinkerbell v1alpha2 API #27

chrisdoherty4 commented Feb 21, 2023 •

edited

Loading

moadqassem Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

moadqassem Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 24, 2023

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023 •

edited

Loading

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023

moadqassem Feb 22, 2023

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023

moadqassem Feb 22, 2023

moadqassem Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023

moadqassem Feb 22, 2023

chrisdoherty4 Feb 24, 2023

chrisdoherty4 Feb 27, 2023 •

edited

Loading

chrisdoherty4 Feb 27, 2023

moadqassem Feb 22, 2023

chrisdoherty4 Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023

moadqassem Feb 26, 2023

moadqassem commented Feb 26, 2023

jacobweinstock Feb 27, 2023

chrisdoherty4 Feb 27, 2023

chrisdoherty4 Feb 27, 2023


		### Webhooks

		We will introduce a set of basic webhooks for validating each CRD. The webhooks will only validate the `v1alpha2` API version. They will be implemented as part of the `tink-controller` component.


		## Migrating from v1alpha1 to v1alpha2

		Given the relative immaturity of the Tinkerbell API we will not provide any migration tooling from `v1alpha1` to `v1alpha2`. Users will be required to manually convert `Hardware` and `Template` resources. `Workflow` resources are considered ephemeral and can generally be deleted.

design: tinkerbell v1alpha2 API #27

design: tinkerbell v1alpha2 API #27

Conversation

chrisdoherty4 commented Feb 21, 2023 • edited Loading

moadqassem Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

chrisdoherty4 Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

moadqassem Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

chrisdoherty4 Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisdoherty4 Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisdoherty4 Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moadqassem Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

chrisdoherty4 Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisdoherty4 Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisdoherty4 Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moadqassem commented Feb 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisdoherty4 commented Feb 21, 2023 •

edited

Loading

moadqassem Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

moadqassem Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

moadqassem Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading

chrisdoherty4 Feb 27, 2023 •

edited

Loading

chrisdoherty4 Feb 22, 2023 •

edited

Loading