What does this sample demonstrate:
- Use "Appendable Template" when building Azure ML pipelines.
What doesn't this sample demonstrate:
- E2E MLOps lifecycle.
In Azure ML, a ML model
comes from a training experiment
which consists of multiple runs
. One type of run
is an Azure ML pipeline
run which also consists of multiple steps
or child runs.
The way provided by the Azure ML SDK for Python to build/publish Azure ML pipelines is a Python script invoking Azure ML SDK methods. This is a simple example
) of building an Azure ML pipeline with Azure ML SDK for Python (taken from diabetes_regression_build_train_pipeline.py
in MLOpsPython
repo).
train_step = PythonScriptStep(
name="Train Model",
script_name=e.train_script_path,
compute_target=aml_compute,
source_directory=e.sources_directory_train,
outputs=[pipeline_data],
arguments=[
"--model_name",
model_name_param,
"--step_output",
pipeline_data,
"--dataset_version",
dataset_version_param,
"--data_file_path",
data_file_path_param,
"--caller_run_id",
caller_run_id_param,
"--dataset_name",
dataset_name,
],
runconfig=run_config,
allow_reuse=True,
)
register_step = PythonScriptStep(
name="Register Model ",
script_name=e.register_script_path,
compute_target=aml_compute,
source_directory=e.sources_directory_train,
inputs=[pipeline_data],
arguments=["--model_name", model_name_param, "--step_input", pipeline_data, ], # NOQA: E501
runconfig=run_config,
allow_reuse=False,
)
register_step.run_after(train_step)
steps = [train_step, register_step]
train_pipeline = Pipeline(workspace=aml_workspace, steps=steps)
train_pipeline._set_experiment_name
train_pipeline.validate()
published_pipeline = train_pipeline.publish(
name=e.pipeline_name,
description="Model training/retraining pipeline",
version=e.build_id,
)
"Appendable Template" is a best practice way of streamline or code style these Azure ML pipeline build scripts. With "Appendable Template", you can:
- Encapsulate Azure ML pipeline steps as classes
- Easily append them to an Azure ML pipeline in a structured way
In "Appendable Template", you need to create classes for pipeline steps, and use them to build an Azure ML pipeline. This is an example of building an Azure ML pipeline with "Appendable Template".
train_step = TrainStep(workspace=aml_workspace, env=e,
compute=aml_compute, config=run_config,
pipeline_parameters=pipeline_parameters,
output_pipelinedata=pipeline_data)
train_step.append_step(steps)
register_step = RegisterStep(workspace=aml_workspace, env=e,
compute=aml_compute, config=run_config,
pipeline_parameters=pipeline_parameters,
input_pipelinedata=pipeline_data)
register_step.append_step(steps)
train_pipeline = Pipeline(workspace=aml_workspace, steps=steps)
train_pipeline._set_experiment_name
train_pipeline.validate()
published_pipeline = train_pipeline.publish(
name=e.pipeline_name,
description="Model training/retraining pipeline",
version=e.build_id,
)
This is an example of a class for a pipeline step. This class encapsulates details of a pipeline step. When multiple developers work on multiple pipeline steps, each developer can work on a specific class for a pipeline step they are working on independently, instead of working on a single script (which leads to merge conflicts).
from azureml.pipeline.steps import PythonScriptStep
class TrainStep:
def __init__(self, workspace, env, compute, config, pipeline_parameters,
output_pipelinedata):
self.workspace = workspace
self.env = env
self.compute = compute
self.config = config
self.pipeline_parameters = pipeline_parameters
self.output_pipelinedata = output_pipelinedata
def append_step(self, step_list):
train_step = PythonScriptStep(
name="Train Model",
compute_target=self.compute,
source_directory=self.env.sources_directory_train,
script_name=self.env.train_script_path,
outputs=[self.output_pipelinedata],
arguments=[
"--model_name",
self.pipeline_parameters["model_name"],
"--step_output",
self.output_pipelinedata,
"--dataset_version",
self.pipeline_parameters["dataset_version"],
"--data_file_path",
self.pipeline_parameters["data_file_path"],
"--caller_run_id",
self.pipeline_parameters["caller_run_id"],
"--dataset_name",
self.env.dataset_name,
],
runconfig=self.config,
allow_reuse=True,
)
if len(step_list) > 0:
previous_step = step_list[-1]
train_step.run_after(previous_step)
step_list.append(train_step)
This sample is based on diabetes_regression
sample in MLOpsPython
repo.
This sample contains minimum files required to run diabetes_regression_build_train_pipeline.py
locally to build an Azure ML pipeline.
diabetes_regression_build_train_pipeline.py
in this repo is a modified version of the original version in MLOpsPython
repo. It is adapted to support the "Appendable Template".
Three new classes for pipeline steps are added to ml_service/pipelines/steps
directory.
diabetes_regression_build_evaluate_steps.py
diabetes_regression_build_register_steps.py
diabetes_regression_build_train_steps.py
- Follow "Development environment setup" in
MLOpsPython
repo.
-
Create an Azure ML workspace in your Azure subscription.
-
Make a copy of .env.example, place it in the root of this sample, configure the variables, and rename the file to
.env
. You need to configureSUBSCRIPTION_ID
,RESOURCE_GROUP
andWORKSPACE_NAME
. -
Run
diabetes_regression_build_train_pipeline.py
fromsamples/appendable-template
directory.
export BUILD_BUILDID=$(uuidgen)
python ml_service/pipelines/diabetes_regression_build_train_pipeline.py
-
In Azure ML studio, click "Pipelines" menu, click "Pipeline endpoints" tab, click the pipeline endpoint you published, and click "Submit" button.
-
In "Set up pipeline run" dialog, select an existing experiment, or create a new experiment, and click "Submit" button.
-
In the pipeline endpoint page, click "Runs" tab, and click the pipeline run you submitted.
-
Clone
MLOpsPython
repo -
Overwrite the
ml_service/pipelines
directory ofMLOpsPython
repo withsamples/appendable-template/ml_service/pipelines
directory of this repo (Overwritediabetes_regression_build_train_pipeline.py
and copy threediabetes_regression_build_*_steps.py
files).