cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message #1058

shajmakh · 2024-10-25T12:12:22Z

Cleanup PR to enhance trouble shooting. Please see the commits for additional info.

openshift-ci · 2024-10-25T12:12:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [shajmakh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani

good idea to expose the selinux policy in use but

logging unconditionally will spam the logs
logs are good only for developers, and we have a way to infer it anyway checking the daemonset, there's 1:1 mapping between security context constraint and selinux policy in use.

What about adding an annotation to the RTE daemonset, which the NROP owns anyway?

shajmakh · 2024-10-29T08:54:10Z

/hold
under testing

ffromani · 2024-10-29T09:06:05Z

pkg/objectupdate/rte/rte.go

@@ -129,6 +129,13 @@ func DaemonSetHashAnnotation(ds *appsv1.DaemonSet, cmHash string) {
 	template.Annotations[hash.ConfigMapAnnotation] = cmHash
 }

+func DaemonSetAnnotation(ds *appsv1.DaemonSet, annotKey string, annotValue string) {


this can be an overkill, but let's roll with it for now

we can add it under the spec annotations; or inject a controller event (shown in the operator CR events) which policy is being used. If it doesn't make a difference why do you think this is an overkill?

adding a new function in pkg/objectupdate/rte seems overkilly to me, but not enough to change course just yet

We tend to use annotation to reflect some useful information which is not exposed officially by the API.

In this case the SELinux context policy exposed explicitly in the daemonset/pod spec so it's part of the official API, hence annotation is not needed.

I might have missed this is already reflected. Can you point where this is set in the DS/pod spec please?

ffromani · 2024-10-29T09:07:30Z

controllers/numaresourcesoperator_controller.go

@@ -206,80 +210,81 @@ func (r *NUMAResourcesOperatorReconciler) reconcileResourceAPI(ctx context.Conte
 	return false, ctrl.Result{}, "", nil
 }

-func (r *NUMAResourcesOperatorReconciler) reconcileResourceMachineConfig(ctx context.Context, instance *nropv1.NUMAResourcesOperator, trees []nodegroupv1.Tree) (bool, ctrl.Result, string, error) {
+func (r *NUMAResourcesOperatorReconciler) reconcileResourceMachineConfig(ctx context.Context, instance *nropv1.NUMAResourcesOperator, trees []nodegroupv1.Tree) (bool, ctrl.Result, string, string, error) {


can we abuse the error instead of adding a new return value?

we can, progressing condition never produces a real error anyway, but I don't think it is right to abuse it because it'll create a confusion on the reconciliation result.
my take is that we can define a struct to group the common values returned by the substeps used in the reconciliation procedure. OR if we want to keep it as similar to the current state we can create another type to store additional info (includes and error and a string for msg) and replace the returned error

Returning a new struct seems the best approach, but this needs a larger refactoring. I'll attempt something in my #1032

shajmakh · 2024-10-29T09:26:22Z

note: the RTE annotation addition/removal is still buggy

When using legacy SELinux context, set that annotation to the RTE DS annotations to make it easy to identify which policy is being considered. Signed-off-by: Shereen Haj <[email protected]>

So far when controller update is in progress it's either because RTEs are not yet ready or MCPs are not yet updated. One can learn that from examining the controller logs. This PR gives another option to reflect this info by reporting the extra info under the condition message both in the controller logs and in the operator `Status`. Signed-off-by: Shereen Haj <[email protected]>

Even if the MCP was not updated the controller will still be reporting it under the operator status. Due to previous adjustments, we are now seeing all MCPs under the status. This is wrong however and should change in next commit. Signed-off-by: Shereen Haj <[email protected]>

So far the operator status was having the mcps set even if they are not updated. Fix this to reflect the MCPs only if they are updated while wait for the rest to get reconciled. Signed-off-by: Shereen Haj <[email protected]>

Add controller tests to verify that when the annotation is set in the NROP CR it is also reflected under the RTE annotations and when it's deleted it also vanishes from the DS annotations. Signed-off-by: Shereen Haj <[email protected]>

Labels and annotations and rarely set but the problem with the old code was that new sets are added on top of what exists in the current sets either update the value or add a new key:value pair. However, the case where the key:value pair is not found in the new set doesn't make a difference in the new map state and if the key is found in the current map it will still exist in the updated one, while the required is to omit it. Apart from the API validations, the controller doesn't validate the labels set nor the annotations. Signed-off-by: Shereen Haj <[email protected]>

shajmakh · 2024-10-31T07:32:21Z

/hold
rebase on top of #1032 when merged

Tal-or · 2024-11-07T12:23:32Z

pkg/objectupdate/rte/rte.go

@@ -129,6 +129,13 @@ func DaemonSetHashAnnotation(ds *appsv1.DaemonSet, cmHash string) {
 	template.Annotations[hash.ConfigMapAnnotation] = cmHash
 }

+func DaemonSetAnnotation(ds *appsv1.DaemonSet, annotKey string, annotValue string) {


We tend to use annotation to reflect some useful information which is not exposed officially by the API.

In this case the SELinux context policy exposed explicitly in the daemonset/pod spec so it's part of the official API, hence annotation is not needed.

Tal-or · 2024-11-07T12:30:41Z

pkg/objectstate/merge/merge.go

@@ -55,44 +55,5 @@ func MetadataForUpdate(current, updated client.Object) (client.Object, error) {
 	updated.SetManagedFields(current.GetManagedFields())
 	updated.SetFinalizers(current.GetFinalizers())

-	_ = Annotations(current, updated)


This involve an actual change in the behavior - it won't keep the current labels, only the updated one.
What's the rational behind this change?
According to the commit message:

if the key is found in the current map it will still exist in the updated one, while the required is to omit it.

But that means the user can override the labels, because the operator is no longer maintaining the existing ones.

it won't keep the current labels, only the updated one.

That's exactly the issue. The problem in the current flow of updating the annotations is that in the case where the user intentionally didn't add the annotation (equivalent to deleting it if it exists) the code won't update that in the next results the annotation will be removed. It'll still exist. Then how the user is supposed to delete that specific annotation?
The annotations are built at every reconciliation iteration, and the RTE ObjectState carry this data until it gets ignored in this MetadataForUpdate.

As far as I have checked, updating the annotations never happened so far that is why this problem wasn't visible before this PR

I see your point, but this how we check for the annotation:

numaresources-operator/internal/api/annotations/annotations.go

Line 29 in 27a91fd

if v, ok := annot[SELinuxPolicyConfigAnnotation]; ok && v == SELinuxPolicyCustom {

so in case we want to disable it, we can change the value to be different than custom

Second thing, assuming we're not keeping the existing annotations like this PR suggests, and one day we'll add another annotation (to enable some other functionality).
If we would like to change one of the annotations we'll have to send all the other annotations as well, so the operator won't delete the existing ones.

@ffromani Could you please provide your POV here?

Originally the idea here was to log explicitely the RTE policy in use. Which works, but it's easy to lose and hard(er) to access for users.
Moving to annotations was my (too quick?) suggestion, and @Tal-or raise good points here.

The desire here is to make troubleshooting easier reporting in a easy-to-access (both for humans and programs) the actual RTE policy in use vs having to navigate through objects and infer, which is and will allways be possible (and supported ofc)

I fundamentally agree with the aforementioned goal, reporting the value is a nice to have option.
We will need to figure out the lowest friction option. If there is not such an option, we will have to put this enhancement in the back burner.

Another random thought: we can add a condition, or shove the value in the existing available condition.
Since we are augmenting information, adding to the Message (which is free form) is not a big deal.

ffromani · 2024-10-30T18:53:30Z

controllers/numaresourcesoperator_controller.go

+
+	customPolicyEnabled := annotations.IsCustomPolicyEnabled(instance.Annotations)
+
+	delete(r.RTEManifests.DaemonSet.Annotations, annotations.SELinuxPolicyConfigAnnotation)


why we need this?

The plan was to reset the annotations to allow discarding it from the DS if required; or in other words to set the annotation in the updated DS only if it is required to set a value to it

ffromani · 2024-11-20T14:40:55Z

pkg/objectstate/merge/merge.go

@@ -55,44 +55,5 @@ func MetadataForUpdate(current, updated client.Object) (client.Object, error) {
 	updated.SetManagedFields(current.GetManagedFields())
 	updated.SetFinalizers(current.GetFinalizers())

-	_ = Annotations(current, updated)


Originally the idea here was to log explicitely the RTE policy in use. Which works, but it's easy to lose and hard(er) to access for users.
Moving to annotations was my (too quick?) suggestion, and @Tal-or raise good points here.

The desire here is to make troubleshooting easier reporting in a easy-to-access (both for humans and programs) the actual RTE policy in use vs having to navigate through objects and infer, which is and will allways be possible (and supported ofc)

I fundamentally agree with the aforementioned goal, reporting the value is a nice to have option.
We will need to figure out the lowest friction option. If there is not such an option, we will have to put this enhancement in the back burner.

Another random thought: we can add a condition, or shove the value in the existing available condition.
Since we are augmenting information, adding to the Message (which is free form) is not a big deal.

openshift-ci bot requested review from ffromani and swatisehgal October 25, 2024 12:12

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2024

ffromani reviewed Oct 25, 2024

View reviewed changes

shajmakh force-pushed the log-selinux-annotation-use branch from 5f678b6 to 03f9660 Compare October 29, 2024 08:51

shajmakh changed the title ~~rte: log if custom SELinux context is used~~ cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message Oct 29, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 29, 2024

ffromani reviewed Oct 29, 2024

View reviewed changes

shajmakh force-pushed the log-selinux-annotation-use branch from 03f9660 to abcb65b Compare October 30, 2024 11:20

shajmakh changed the title ~~cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message~~ WIP: cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message Oct 30, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2024

shajmakh force-pushed the log-selinux-annotation-use branch from abcb65b to 0245017 Compare October 30, 2024 11:27

shajmakh added 6 commits October 30, 2024 15:36

rte: set custom SELinux policy annotation in RTE DS

0bbf0c1

When using legacy SELinux context, set that annotation to the RTE DS annotations to make it easy to identify which policy is being considered. Signed-off-by: Shereen Haj <[email protected]>

ctrl: reflect MCPs under operator status only if updated

b6977ac

So far the operator status was having the mcps set even if they are not updated. Fix this to reflect the MCPs only if they are updated while wait for the rest to get reconciled. Signed-off-by: Shereen Haj <[email protected]>

shajmakh force-pushed the log-selinux-annotation-use branch from 0245017 to 3dc13bc Compare October 30, 2024 13:36

shajmakh changed the title ~~WIP: cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message~~ cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message Oct 30, 2024

openshift-ci bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 30, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2024

Tal-or reviewed Nov 7, 2024

View reviewed changes

ffromani reviewed Nov 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message #1058

cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message #1058

shajmakh commented Oct 25, 2024 •

edited

Loading

openshift-ci bot commented Oct 25, 2024

ffromani left a comment

shajmakh commented Oct 29, 2024

ffromani Oct 29, 2024

shajmakh Oct 29, 2024

ffromani Oct 29, 2024

Tal-or Nov 7, 2024

shajmakh Nov 20, 2024

ffromani Oct 29, 2024

shajmakh Oct 29, 2024

ffromani Oct 29, 2024

shajmakh commented Oct 29, 2024

shajmakh commented Oct 31, 2024

Tal-or Nov 7, 2024

Tal-or Nov 7, 2024

shajmakh Nov 20, 2024

shajmakh Nov 20, 2024

Tal-or Nov 20, 2024 •

edited

Loading

ffromani Nov 20, 2024

ffromani Oct 30, 2024

shajmakh Nov 20, 2024

ffromani Nov 20, 2024


		customPolicyEnabled := annotations.IsCustomPolicyEnabled(instance.Annotations)

		delete(r.RTEManifests.DaemonSet.Annotations, annotations.SELinuxPolicyConfigAnnotation)

cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message #1058

Are you sure you want to change the base?

cleanup PR: reflect SELinux policy in RTE DS and report Progressing condition message #1058

Conversation

shajmakh commented Oct 25, 2024 • edited Loading

openshift-ci bot commented Oct 25, 2024

ffromani left a comment

Choose a reason for hiding this comment

shajmakh commented Oct 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shajmakh commented Oct 29, 2024

shajmakh commented Oct 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tal-or Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shajmakh commented Oct 25, 2024 •

edited

Loading

Tal-or Nov 20, 2024 •

edited

Loading