Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DSC status with the related components status" #1431

Merged

Conversation

lburgazzoli
Copy link
Contributor

@lburgazzoli lburgazzoli commented Dec 8, 2024

Description

This commit includes:

  • exposing common components status to the DSC as part of
    status.components field
  • introduce a new Ready conditions that becomes true only when all the
    components are also Ready
  • reduces DSC chattering, by reducing the number of updates per reconcile

Notes:

https://issues.redhat.com/browse/RHOAIENG-15042

How Has This Been Tested?

e2e tests in progress

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Dec 8, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

github-actions bot commented Dec 8, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12221051844

@lburgazzoli lburgazzoli force-pushed the RHOAIENG-15042-dsc branch 2 times, most recently from 4c47d64 to 3eb275f Compare December 8, 2024 10:36
Copy link
Contributor

github-actions bot commented Dec 8, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12221061488

Copy link
Contributor

github-actions bot commented Dec 8, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12221069570

Copy link
Contributor

github-actions bot commented Dec 8, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12221076816

Copy link

codecov bot commented Dec 8, 2024

Codecov Report

Attention: Patch coverage is 0.13986% with 714 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@5f456ad). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...atasciencecluster/datasciencecluster_controller.go 0.00% 191 Missing ⚠️
controllers/components/codeflare/codeflare.go 0.00% 43 Missing ⚠️
controllers/components/dashboard/dashboard.go 0.00% 42 Missing ⚠️
...nents/datasciencepipelines/datasciencepipelines.go 0.00% 42 Missing ⚠️
controllers/components/kserve/kserve.go 0.00% 42 Missing ⚠️
controllers/components/kueue/kueue.go 0.00% 42 Missing ⚠️
...rs/components/modelmeshserving/modelmeshserving.go 0.00% 42 Missing ⚠️
...trollers/components/modelregistry/modelregistry.go 0.00% 42 Missing ⚠️
controllers/components/ray/ray.go 0.00% 42 Missing ⚠️
...rs/components/trainingoperator/trainingoperator.go 0.00% 42 Missing ⚠️
... and 8 more
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1431   +/-   ##
=======================================
  Coverage        ?   19.12%           
=======================================
  Files           ?      149           
  Lines           ?    10353           
  Branches        ?        0           
=======================================
  Hits            ?     1980           
  Misses          ?     8149           
  Partials        ?      224           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Dec 8, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12223711116

@VaishnaviHire
Copy link
Member

Adding @grdryn, since this change will be helpful for setting Kserve status

Copy link

@lphiri lphiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit on naming , but otherwise looks good to me.

Copy link

openshift-ci bot commented Dec 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lphiri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 9, 2024
Copy link
Member

@grdryn grdryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, with some minor suggestions and questions 👍

)

// TODO: all the logic about the deletion configmap should be moved to another controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO for this PR, or for the future? If future, would it make sense to have a Jira issue for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added jira link

if err != nil {
return instance, fmt.Errorf("failed to update DataScienceCluster status after reconciling %s: %w", componentName, err)
default:
return nil, fmt.Errorf("unsuported management state: %s", ms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("unsuported management state: %s", ms)
return nil, fmt.Errorf("unsupported management state: %s", ms)

/me hides 😶‍🌫️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err != nil {
func (r *DataScienceClusterReconciler) watchDeleteConfigMap(ctx context.Context, a client.Object) []reconcile.Request {
cm, ok := a.(*corev1.ConfigMap)
if !ok {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make sense as an alternative to pull out these conditions to a predicate to be used on the Watches method on L319, and just use handlers.Fn(r.watchDataScienceClusters) as the event handler? I'm not sure if I'm thinking about this correctly or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, let me know what you think

NewComponentReconciler(ctx context.Context, mgr ctrl.Manager) error
// UpdateDSCStatus update the component specific status part od the DSC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// UpdateDSCStatus update the component specific status part od the DSC
// UpdateDSCStatus updates the component specific status part of the DSC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 85 to 99
/*
predicate.Funcs{
UpdateFunc: func(event event.UpdateEvent) bool {
fmt.Println(
">>>",
"gvk", event.ObjectNew.GetObjectKind().GroupVersionKind(),
"namespace", event.ObjectNew.GetNamespace(),
"name", event.ObjectNew.GetName(),
"diff", cmp.Diff(event.ObjectOld, event.ObjectNew),
)

return true
},
}),
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment expected to be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

})

if err != nil {
return fmt.Errorf("error waiting Ready state for CodeFlare componetn in DSC: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("error waiting Ready state for CodeFlare componetn in DSC: %w", err)
return fmt.Errorf("error waiting on Ready state for CodeFlare component in DSC: %w", err)

and equivalent suggestion for the other tests that have this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// ModelMeshServing component status.
ModelMeshServing componentApi.DSCModelMeshServingStatus `json:"modelmeshserving,omitempty"`

// DataServicePipeline component status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DataServicePipeline component status.
// DataSciencePipeline component status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ModelMeshServing componentApi.DSCModelMeshServingStatus `json:"modelmeshserving,omitempty"`

// DataServicePipeline component status.
// Require OpenShift Pipelines Operator to be installed before enable component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Require OpenShift Pipelines Operator to be installed before enable component
// Requires OpenShift Pipelines Operator to be installed before enable component

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

DataSciencePipelines componentApi.DSCDataSciencePipelinesStatus `json:"datasciencepipelines,omitempty"`

// Kserve component status.
// Require OpenShift Serverless and OpenShift Service Mesh Operators to be installed before enable component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Require OpenShift Serverless and OpenShift Service Mesh Operators to be installed before enable component
// Requires OpenShift Serverless and OpenShift Service Mesh Operators to be installed before enable component

I wonder if this is still true in all cases actually...I guess they shouldn't be absolutely necessary for RawDeployment mode? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I don't know, the comment was there alrady (just copy pasted, and now removed since it does not make any sense on the status subresource)

Kueue componentApi.DSCKueueStatus `json:"kueue,omitempty"`

// CodeFlare component status.
// If CodeFlare Operator has been installed in the cluster, it should be uninstalled first before enabled component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If CodeFlare Operator has been installed in the cluster, it should be uninstalled first before enabled component.
// If CodeFlare Operator has been installed in the cluster, it should be uninstalled first before enabling component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@openshift-ci openshift-ci bot removed the lgtm label Dec 9, 2024
Copy link
Contributor

github-actions bot commented Dec 9, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12245986753

Copy link
Contributor

github-actions bot commented Dec 9, 2024

This PR can't be merged just yet 😢

Please run make generate manifests api-docs and commit the changes.

For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/12246154643

@lburgazzoli
Copy link
Contributor Author

lburgazzoli commented Dec 10, 2024

@grdryn @VaishnaviHire e2e tests are a little bit of a mess, I'll refactor them out once we get this and #1437 merged if that's fine with you

@VaishnaviHire
Copy link
Member

/lgtm

@lburgazzoli
Copy link
Contributor Author

@VaishnaviHire @grdryn mind another review ? rebased on top of main

}

readyCondition := meta.FindStatusCondition(ci.GetStatus().Conditions, status.ConditionTypeReady)
if readyCondition != nil && readyCondition.Status != metav1.ConditionTrue {
Copy link
Contributor

@ykaliuta ykaliuta Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ready if there is no ready condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot, let me fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit, now the logic is that:

  • if the component is not managed, skip checking its readiness
  • if the component is managed, then it is ready if the ready condition is present with status True
  • the message DSC's Ready condition message has been made generic as multiple factors could contribute to set it to ready (i.e. a cluster without managed components is still Ready)

@VaishnaviHire
Copy link
Member

@lburgazzoli Branch needs a rebase

This commit includes:
- exposing common components status to the DSC as part of
  status.components field
- introduce a new Ready conditions that becomes true only when all the
  components are also Ready
- reduces DSC chattering, by reducing the number of updates per
  reconcile

Signed-off-by: Luca Burgazzoli <[email protected]>
@lburgazzoli
Copy link
Contributor Author

@lburgazzoli Branch needs a rebase

done

@VaishnaviHire
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Dec 11, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 817b465 into opendatahub-io:main Dec 11, 2024
10 checks passed
@lburgazzoli lburgazzoli deleted the RHOAIENG-15042-dsc branch December 11, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants