Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(katib): Increase experiment batch-size #93

Merged
merged 1 commit into from
Jul 17, 2024
Merged

Conversation

orfeas-k
Copy link
Contributor

@orfeas-k orfeas-k commented Jul 17, 2024

Increase experiment's batch-size in order for the experiment to perform less training and thus complete earlier.
The UAT's role is to confirm that the controller works, rather than run a full experiment training. (more details about debugging in the issue)

Fixes canonical/katib-operators#211

Testing

In order to test, deploy CKF 1.9/beta to a Microk8s 1.29 and Juju 3.4. However, in order to perform a proper test of this you 'll need to rebase this branch (or cherry-pick the one commit) over the PR #92 branch in order to have the new requirements as well. If done so, run

tox -e kubeflow-local -- --filter "katib"

Increase experiment's batch-size in order for the experiment to perform
less training, since the UAT's role is to confirm that the controller
works, rather than run a full experiment training.
@misohu
Copy link
Member

misohu commented Jul 17, 2024

Cant we run it in AKS and EKS in the CI?

@orfeas-k
Copy link
Contributor Author

Copy link
Member

@misohu misohu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although the AKS EKS uats runs are failing katib uats in those are passing. Good job.

@orfeas-k orfeas-k merged commit 6a3edd6 into main Jul 17, 2024
1 check passed
@orfeas-k orfeas-k deleted the kf-5975-fix-katib branch July 17, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

katib-integration UAT is failing for 1.9/beta
2 participants