Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RHOAIENG-1051] - update opendatahub folder to kustomize #280

Merged
merged 2 commits into from
Apr 9, 2024

Conversation

spolti
Copy link
Member

@spolti spolti commented Feb 21, 2024

chore: opendatahub folder have
- quickstart
- docs
- manifests/scripts to support fvt After transition manifests are merged, the way to deploy modelmesh-serving is changed so we should update all related one.

How to test:

PR checklist

Checklist items below are applicable for development targeted to both fast and stable branches/tags

  • Unit tests pass locally
  • FVT tests pass locally
  • If the PR adds a new container image or updates the tag of an existing image (not build within cpaas), is the corresponding change made in live-builder and cpaas-midstream to add/update the image tag in the operator CSV? Link the PRs if applicable

Checklist items below are applicable for development targeted to both fast and stable branches/tags

  • Tested modelmesh serving deployment with odh-manifests and ran odh-manifests-e2e tests locally

@spolti spolti marked this pull request as draft February 21, 2024 14:09
@openshift-ci openshift-ci bot requested a review from terrytangyuan February 21, 2024 14:09
@spolti spolti marked this pull request as ready for review March 14, 2024 20:04
@Jooho
Copy link

Jooho commented Mar 19, 2024

@spolti I followed the description and hit errors:

./deploy.sh: line 4: /tmp/modelmesh-serving/opendatahub/quickstart/scripts/utils.sh: No such file or directory
make: *** No rule to make target 'deploy-mm-for-odh'.  Stop.
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
No resources found
Error from server (NotFound): namespaces "model-serving" not found
Now using project "model-serving" on server "https://api.jooho.n1ai.p3.openshiftapps.com:443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app rails-postgresql-example

to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname

error: the path "/tmp/modelmesh-serving/opendatahub/quickstart/basic/common_manifests/openvino-serving-runtime.yaml" does not exist
error: the path "/tmp/modelmesh-serving/opendatahub/quickstart/basic/common_manifests/openvino-inference-service.yaml" does not exist
./deploy.sh: line 28: wait_for_pods_ready: command not found
./deploy.sh: line 30: success: command not found

@Jooho
Copy link

Jooho commented Mar 19, 2024

@spolti Basic folder is only one to test or are there any other folders?

@spolti
Copy link
Member Author

spolti commented Mar 19, 2024

@Jooho it should be working now, not sure why it happened, but was able to run previously locally.

@Jooho
Copy link

Jooho commented Mar 19, 2024

now, it removed the modelmesh-serving folder itself.

  [RHOAIENG-1051](https://issues.redhat.com//browse/RHOAIENG-1051)  /tmp/modelmesh-serving/opendatahub/quickstart/basic                                                                   15:57:06  jooho 
❯ ./deploy.sh 
./opendatahub/scripts/install_odh.sh 
.. Downloading binaries
.. Creating a bin folder
yq already installed.
Installing kustomize.
curl: (22) The requested URL returned error: 404
tar: /tmp/kustomize.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
mv: cannot stat '/tmp/kustomize': No such file or directory
rm: cannot remove '/tmp/kustomize.tar.gz': No such file or directory
installed kustomize version v5.3.0
Delete the exising /tmp/modelmesh-e2e folder
Creating a /tmp/modelmesh-e2e folder
Already on project "opendatahub" on server "https://api.jooho.n1ai.p3.openshiftapps.com:443".
.. Archiving odh-manifests
cp: cannot stat '/tmp/modelmesh-serving/opendatahub/scripts/../../config': No such file or directory
cp: cannot stat '/tmp/modelmesh-serving/opendatahub/scripts/manifests': No such file or directory
Latest Manifest will be used, fast tag
.. Deploying ModelMesh with kustomize
./opendatahub/scripts/install_odh.sh: line 158: /tmp/modelmesh-serving/config/overlays/odh/params.env: No such file or directory
params.env:
cat: /tmp/modelmesh-serving/config/overlays/odh/params.env: No such file or directory

installing namespaced rbac
2024/03/19 15:57:09 unable to make loader at '.'; not a valid directory: abs path error on '.' : getwd: no such file or directory
error: no objects passed to apply
cat: /tmp/modelmesh-serving/config/namespace-runtimes/kustomization.yaml: No such file or directory
2024/03/19 15:57:09 unable to make loader at '.'; not a valid directory: abs path error on '.' : getwd: no such file or directory
^Cmake: *** [Makefile:297: deploy-mm-for-odh] Interrupt

cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

please double check rm command.

chore: opendatahub folder have
	- quickstart
	- docs
	- manifests/scripts to support fvt
       After transition manifests are merged, the way to deploy modelmesh-serving is changed so we should update all related one.

Signed-off-by: Spolti <[email protected]>
@Jooho
Copy link

Jooho commented Apr 2, 2024

/retest

@spolti With this pr, what should I test?
3 quickstarts ?

  • basic
  • hpa
  • PVC

anything else?

@spolti
Copy link
Member Author

spolti commented Apr 2, 2024

only these 3 should be fine.

kustomize build ${MANIFESTS_DIR}/runtimes/ |oc delete -f -
oc delete pvc,pod --all --force -n modelmesh-serving
oc delete ns $namespace
kustomize build "${MANIFESTS_DIR}"/runtimes/ |oc delete -f -
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should be moved after line number 35 because of crd creation.

@Jooho
Copy link

Jooho commented Apr 2, 2024

basic/hpa worked but pvc failed.

modelmesh-controller failed

{"level":"info","ts":"2024-04-02T20:12:53Z","logger":"setup","msg":"MMesh Configuration","serviceName":"modelmesh-serving","port":8033,"mmeshEndpoint":""}
{"level":"error","ts":"2024-04-02T20:12:53Z","logger":"setup","msg":"In cluster scope mode but controller does not have cluster scope permissions, exiting","stacktrace":"main.main\n\t/go/src/github.com/opendatahub-io/modelmesh-serving/main.go:210\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}

Latest uses go1.22 while we use 1.20

Signed-off-by: Spolti <[email protected]>
@VedantMahabaleshwarkar
Copy link

Is this quickstart still deploying model-mesh?
The current issue with these set of quickstarts, in my opinion, is that the deploy.sh script looks to install model-mesh, which is not what I have found I needed whenever I looked to these quickstarts. Most of the time I find myself looking at these quickstarts, I already have an ODH installation, and what I'm looking for is a 1 line command to deploy a sample inferenceservice.
IMO, these quickstarts should operate under the assumption that ODH is already installed, and only focus on deploying the models and performing inference. Or, at least there should be an option for this in addition to what already exists.

WDYT @Jooho @israel-hdez @spolti @danielezonca

@spolti
Copy link
Member Author

spolti commented Apr 5, 2024

FVT is working fine, just one test is failing, seems to be a issue with it:

Scaling of runtime deployments with HPA Autoscaler when there are no predictors Scale all runtimes down after a created test predictor is deleted
/Users/fspolti/data/dev/sources/modelmesh-serving/fvt/hpa/hpa_test.go:149
  2024-04-05T17:45:14-03:00	INFO	Delete all predictors ...
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-mlserver-1.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:false modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-ovms-1.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-torchserve-0.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:false]
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-triton-2.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
  2024-04-05T17:45:17-03:00	INFO	All deployments are ready: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
  2024-04-05T17:45:27-03:00	INFO	Timed out after 10s without events
  STEP: Creating a test predictor for one Runtime @ 04/05/24 17:45:27.361
  STEP: Creating predictor mlserver-sklearn-mnist-svm-gpqgm @ 04/05/24 17:45:27.361
  STEP: Waiting for predictor mlserver-sklearn-mnist-svm-gpqgm to be 'Loaded' @ 04/05/24 17:45:27.665
  2024-04-05T17:45:27-03:00	INFO	Watcher got event with object	{"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": null}
  2024-04-05T17:45:27-03:00	INFO	Watcher got event with object	{"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": {"message":"Waiting for runtime Pod to become available","modelId":"mlserver-sklearn-mnist-svm-gpqgm__ksp-b20a0c5aca","reason":"RuntimeUnhealthy"}}

  [FAILED] in [It] - /Users/fspolti/data/dev/sources/modelmesh-serving/fvt/helpers.go:355 @ 04/05/24 17:47:27.664
  2024-04-05T17:47:27-03:00	INFO	Running command	{"args": "kubectl get predictors -n model-serving"}
  =====================================================================================================================================
  NAME                               TYPE      AVAILABLE   ACTIVEMODEL   TARGETMODEL   TRANSITION   AGE
  mlserver-sklearn-mnist-svm-gpqgm   sklearn   false       Pending                     UpToDate     2m3s

@israel-hdez
Copy link

Is this quickstart still deploying model-mesh? The current issue with these set of quickstarts, in my opinion, is that the deploy.sh script looks to install model-mesh, which is not what I have found I needed whenever I looked to these quickstarts. Most of the time I find myself looking at these quickstarts, I already have an ODH installation, and what I'm looking for is a 1 line command to deploy a sample inferenceservice. IMO, these quickstarts should operate under the assumption that ODH is already installed, and only focus on deploying the models and performing inference. Or, at least there should be an option for this in addition to what already exists.

WDYT @Jooho @israel-hdez @spolti @danielezonca

Note that I haven't reviewed the code and I think I don't have full context. So, just trying to answer...

In most projects I have played with, quickstarts assume a clean environment and install everything to quickly give you a working setup. Also, usually quickstarts are for trying the project (i.e. non-production, demos). Because of this, a quickstart don't let you customize the setup (that's left to the official installer).

I would agree that deploying a sample model should be left to a different script than the quickstart setup, although it could be part of the same doc page (probably arranged like a tutorial).

That said, I also didn't use too much the quickstarts because what I personally remember is that, rather than providing you with a setup that you can play with, I think it prepared the env more like a demo that is also suited for running FVTs/CI... and I had to spend time "cleaning" the env. This is different from your case: you just want to deploy a sample model on an existent setup, while I wanted to quickly have a base setup (without additional stuff) to try my own models. ... but don't trust me about this (I may be remembering incorrectly why I didn't use the quickstarts that much).

Copy link

@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I did e2e test and it works

CONTROLLERNAMESPACE=opendatahub NAMESPACE=modelmesh-serving NAMESPACESCOPEMODE=true make e2e-test-for-odh
...
Ginkgo ran 1 suite in 3m56.85221035s
Test Suite Passed
Passed fvt/hpa. Move on the next test
[SUCCESS] FVT Test Passed!

@openshift-ci openshift-ci bot added the lgtm label Apr 9, 2024
Copy link

openshift-ci bot commented Apr 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jooho, spolti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 0917661 into opendatahub-io:main Apr 9, 2024
8 checks passed
VedantMahabaleshwarkar pushed a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Apr 26, 2024
#### Motivation
Support for TorchServe was added in opendatahub-io#250 and kserve/modelmesh-runtime-adapter#34. A test should be added for it as well.

#### Modifications
- Adds basic FVT for load/inference with a TorchServe MAR model using the native TorchServe gRPC API
- Disables OVMS runtime and tests to allow TorchServe to be tested due to resource constraints

#### Result
Closes opendatahub-io#280 

Signed-off-by: Rafael Vasquez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants