Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: <carry>: feat: mount EmptyDir volumes for launcher write locations #19

Merged

Conversation

gregsheremeta
Copy link

Launcher writes input artifacts to root paths /gcs, /minio, and /s3. These paths are not accessible by non-root users by default, which is problematic in locked-down Kubernetes installations and/or OpenShift. /gcs is currently a contract for KFP v2 python component wrappers, so the path cannot be changed.

Mount an EmptyDir scratch volume to these paths to work around this.

Additionally, /.local and /.cache are written to by pip, so add EmptyDir mounts for those too.

Fixes: https://issues.redhat.com/browse/RHOAIENG-1889

Ref: kubeflow#5673
Ref: kubeflow#7345

Testing

Using the kfp 2.6.0+ SDK, compile the test pipeline here https://github.com/gregsheremeta/gists/blob/main/test-iris-pipeline.py (or use the pre-compiled pipeline here https://github.com/gregsheremeta/gists/blob/main/test-iris-pipeline.yaml)

Install DSP from master, upload the pipeline, and run it. It should fail when the pipeline tries to write to the /s3 scratch location before it does its S3 upload, with an error message like so:

F0208 23:34:31.768124 35 main.go:49] failed to execute component: unable to create directory "/s3/mlpipeline/iris-training-pipeline/9ca69cde-b7c0-437e-8260-36731be8e2b9/create-dataset" for output artifact "iris_dataset": mkdir /s3: permission denied

Then, install DSP from this PR, upload the pipeline, and run it. It will now work. Note that the test pipeline is not using any special image -- it uses docker.io/python:3.9.17

…cations

Launcher writes input artifacts to root paths /gcs, /minio, and /s3.
These paths are not accessible by non-root users by default, which is
problematic in locked-down Kubernetes installations and/or OpenShift.
/gcs is currently a contract for KFP v2 python component wrappers, so
the path cannot be changed.

Mount an EmptyDir scratch volume to these paths to work around this.

Additionally, /.local and /.cache are written to by pip, so add
EmptyDir mounts for those too.

Fixes: https://issues.redhat.com/browse/RHOAIENG-1889

Ref: kubeflow#5673
Ref: kubeflow#7345
@gregsheremeta
Copy link
Author

/hold

why is it auto-approved? lol

@dsp-developers
Copy link

A set of new images have been built to help with testing out this PR:
API Server: quay.io/opendatahub/ds-pipelines-api-server:pr-19
DSP DRIVER: quay.io/opendatahub/ds-pipelines-driver:pr-19
DSP LAUNCHER: quay.io/opendatahub/ds-pipelines-launcher:pr-19
Persistence Agent: quay.io/opendatahub/ds-pipelines-persistenceagent:pr-19
Scheduled Workflow Manager: quay.io/opendatahub/ds-pipelines-scheduledworkflow:pr-19
MLMD Server: quay.io/opendatahub/ds-pipelines-metadata-grpc:pr-19
MLMD Envoy Proxy: quay.io/opendatahub/ds-pipelines-metadata-envoy:pr-19
UI: quay.io/opendatahub/ds-pipelines-frontend:pr-19

@dsp-developers
Copy link

An OCP cluster where you are logged in as cluster admin is required.

The Data Science Pipelines team recommends testing this using the Data Science Pipelines Operator. Check here for more information on using the DSPO.

To use and deploy a DSP stack with these images (assuming the DSPO is deployed), first save the following YAML to a file named dspa.pr-19.yaml:

apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
kind: DataSciencePipelinesApplication
metadata:
  name: pr-19
spec:
  dspVersion: v2
  apiServer:
    image: "quay.io/opendatahub/ds-pipelines-api-server:pr-19"
    argoDriverImage: "quay.io/opendatahub/ds-pipelines-driver:pr-19"
    argoLauncherImage: "quay.io/opendatahub/ds-pipelines-launcher:pr-19"
  persistenceAgent:
    image: "quay.io/opendatahub/ds-pipelines-persistenceagent:pr-19"
  scheduledWorkflow:
    image: "quay.io/opendatahub/ds-pipelines-scheduledworkflow:pr-19"
  mlmd:  
    deploy: true  # Optional component
    grpc:
      image: "quay.io/opendatahub/ds-pipelines-metadata-grpc:pr-19"
    envoy:
      image: "quay.io/opendatahub/ds-pipelines-metadata-envoy:pr-19"
  mlpipelineUI:
    deploy: true  # Optional component 
    image: "quay.io/opendatahub/ds-pipelines-frontend:pr-19"
  objectStorage:
    minio:
      deploy: true
      image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance'

Then run the following:

cd $(mktemp -d)
git clone [email protected]:opendatahub-io/data-science-pipelines.git
cd data-science-pipelines/
git fetch origin pull/19/head
git checkout -b pullrequest dfe6edc7323cf3b441e4baf2b88aaa5d70a019b4
oc apply -f dspa.pr-19.yaml

More instructions here on how to deploy and test a Data Science Pipelines Application.

@HumairAK HumairAK removed the approved label Feb 21, 2024
@HumairAK
Copy link

/lgtm
/approve

@HumairAK HumairAK merged commit 57830bf into opendatahub-io:master Feb 22, 2024
1 of 2 checks passed
@openshift-ci openshift-ci bot added the lgtm label Feb 22, 2024
Copy link

openshift-ci bot commented Feb 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gregsheremeta, HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [HumairAK,gregsheremeta]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants