This repository enables to easily schedule existing Vertex pipelines.
It uploads Vertex pipelines templates to an Artifact Registry repository and schedules pipelines using Cloud Scheduler and Cloud Functions.
It does for you the creation of the required service accounts, configures the required permissions and creates the necessary cloud resources.
- Unix-like environment (Linux, macOS, WSL, etc... Tested on MacOS Monterey, M1 chip & GNU/Linux 10)
- Google SDK (gcloud) (instructions here)
- Terraform (tested for version v1.5.6) (instructions here)
- Having
wget
installed (instructions here for Linux and for MacOS:brew install wget
) - Having
jq
installed (instructions here for Linux and for MacOS:brew install jq
. Tested for version jq-1.6) - Having
yq
installed (instructions here for Linux and for MacOS:brew install yq
. Tested for version 4.35.2) - Having a compiled Vertex pipeline (instructions here) or use the
hello_world_pipeline.yaml
file in thepipelines
directory to test the scheduling rapidly.
First, execute these Google Cloud commands:
export GCP_PROJECT_ID=<gcp_project_id>
gcloud config set project $GCP_PROJECT_ID
gcloud auth login
gcloud auth application-default login
Create a working directory and download this repository:
wget -O scheduled-pipelines.zip https://github.com/artefactory/scheduled-pipelines/archive/main.zip
Then unzip it:
unzip scheduled-pipelines.zip \
&& rm scheduled-pipelines.zip \
&& mv scheduled-pipelines-main scheduled-pipelines \
&& cd scheduled-pipelines
To use this repository, you need to:
-
In the
pipelines
directory, put your compiled pipeline(s) (YAML file) inside or directly use the dummy pipeline "hello_world_pipeline.yaml
" (already in thepipelines
directory). -
Replace the values in your configuration file
scheduled_pipelines_config.yaml
with the values corresponding to your project. -
Enable the required APIs:
gcloud services enable \
cloudscheduler.googleapis.com \
cloudfunctions.googleapis.com \
cloudbuild.googleapis.com \
artifactregistry.googleapis.com \
storage-component.googleapis.com \
aiplatform.googleapis.com \
--project=$GCP_PROJECT_ID
- Deploy the scheduled pipeline(s) and its (their) infrastructure:
make deploy_scheduled_pipelines
This command will:
- Create the service accounts used to run the scheduled pipelines and schedule them.
- Create the necessary cloud resources (Cloud Scheduler, Cloud Functions, Artifact Registry repository).
- Give the appropriate permissions to the service accounts.
- Upload the pipeline templates to the Artifact Registry repository.
- (Optional) If you modify the compiled pipeline(s) and/or you modify the configuration file, just run the previous command again.
Go to the Cloud Scheduler page in the Google Cloud console and make sure the right schedulers are present.
Trigger a force run of the scheduler:
If the "Status of last execution" is failure, check the troubleshooting section below.
If the "Status of last execution" is "Success", check that the Vertex pipeline is running as expected (make sure you selected the right region). If it it not running, check the logs of the cloud function.
- Check that the "Status of last execution" of the scheduler is "Success". If this is not the case:
First check the logs of the cloud scheduler to see whether the error is coming from the scheduler (permission denied) or the cloud function (internal error).
If the error is a permission denied, check that the service account of the cloud scheduler has the right permissions on the cloud function. If the error is an internal error, check the cloud function logs (Go to the Cloud functions page, click on the cloud function name and go to the "LOGS" tab).
- If the "Status of last execution" is "Success" but the pipeline is not running, check the logs of the cloud function to debug.
The required permissions required to execute the make deploy_scheduled_pipelines
command are:
Resource creation | Permission(s) required |
---|---|
Create service account | iam.serviceAccounts.create |
Create artifact registry | artifactregistry.repositories.create |
Creation cloud function | cloudfunctions.functions.create, cloudbuild.builds.create |
Create cloud scheduler | cloudscheduler.jobs.create |
Resource to give permission on (iam-policy-binding) | Permission required |
---|---|
Project | resourcemanager.projects.setIamPolicy |
Cloud storage | storage.buckets.setIamPolicy |
Cloud function | cloudfunctions.functions.setIamPolicy |
Artifact registry | artifactregistry.repositories.setIamPolicy |