Today, we'll be discussing and developing an Azure DevOps pipeline wrapped around an Azure MLOps pipeline to proactively scale a Kubernetes cluster based on Machine Learning predictions.
At the end of the workshop, your Azure account will contain a working instance of this configuration for reference and experimentation.
We have included three Azure DevOps "challenges" over the course the workshop. We will update this documentation with solutions to these challenges as we go.
NOTE: This workshop is going to involve provisioning and configuring Azure resources such as ML Pipelines, Kubernetes Clusters, Azure Active Directory App Registrations, and Azure DevOps projects. If you already have a corporate Azure account, there's a good chance that you do not have permission to take these actions. If that's the case, we recommend that you sign up for a fresh Azure Free Account.
- Create a new Azure Free Account (if necessary)
- Navigate to https://azure.microsoft.com/en-us/free/
- Select the "Start free" button
We are going to need to create Azure resources that need globally unique identifiers. To do this, we're going to use a "base name prefix" throughout the workshop. This prefix should be 7-8 characters and only contain numbers and lowercase letters. Pick an identifier of the following format and write it down:
<your-initials><4 random digits>
For instance, I might pick mms4721
. You will use this wherever you see the <baseName>
token in scripts and variables.
Provisioning the Kubernetes cluster can take a few minutes. Let's crack open Azure Cloud Shell and get that started in the background.
- Navigate to the Azure Cloud Shell at https://shell.azure.com
- Use the following script to create an Azure Resource Group, Service Principal, and AKS Cluster (docs)
# az login # Not required in Azure Cloud Shell # If you've already run this script, you'll need to remove cached service principal info in Azure # rm .azure/aksServicePrincipal.json az group create --name <baseName>-AML-RG --location eastus az provider register --namespace Microsoft.Network az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.Storage az ad sp create-for-rbac --name atmsdevdayapp # `aks create` will take a while! # substitute values from `create-for-rbac` above! az aks create --resource-group <baseName>-AML-RG \ --name atDevDayCluster \ --service-principal <appId from create-for-rbac> \ --client-secret <password from create-for-rbac> \ --node-count 1 \ --vm-set-type VirtualMachineScaleSets \ --enable-cluster-autoscaler \ --generate-ssh-keys \ --node-vm-size Standard_D2_v3 \ --min-count 1 \ --max-count 2 # disable auto-scaling so we can proactively scale! az aks update --resource-group <baseName>-AML-RG --name atDevDayCluster --disable-cluster-autoscaler
- If you get Service Principal errors, review the following article: Service principals with Azure Kubernetes Service (AKS)
- This script will take a few minutes to provision the cluster
- Verify that AKS Cluster has been created correctly
Congratulations! You have created an AKS Cluster. We will use the ML models to proactively scale the cluster later in the workshop.
- Record Azure Service Principal Information and Create a Client Secret
Portal -> Azure Active Directory -> App registrations -> atmsdevdayapp
- Record the following:
- Application (client) ID
- Directory (tenant) ID
- Create a Client Secret
- Create a new Azure DevOps Organization (if necessary) (docs)
- Navigate to https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/
- Select the Basic Plan column's "Start free >" button
- Create a new Azure DevOps Project (docs)
Azure DevOps Organization Screen -> + New project (top right)
- Clone the source repo into your project
- Capture Configuration Data in an Azure DevOps Variable Group
Azure DevOps Project -> Sidebar -> Pipelines -> Library -> Variable Groups -> "+ Variable group
-
Variable group name:
devopsforai-aml-vg
-
Add the following variables:
Variable Name Suggested Value AML_COMPUTE_CLUSTER_NAME train-cluster
BASE_NAME <baseName>
EXPERIMENT_NAME mlopspython
LOCATION eastus
MODEL_NAME sklearn_regression_model.pkl
SOURCES_DIR_TRAIN python_scripts
SP_APP_ID <Application (client) ID from above>
SP_APP_SECRET <Client Secret Value from above>
SUBSCRIPTION_ID <Azure Subscription ID>
TENANT_ID <Directory (tenant) ID from above>
TRAIN_SCRIPT_PATH train.py
TRAINING_PIPELINE_NAME training-pipeline
-
- Create an Azure Resource Manager Service Connection (docs)
Azure DevOps Project Sidebar -> Project settings -> Pipelines -> Service connections -> + New service connection -> Azure Resource Manager -> Service principal (automatic)
- Create a Build to Provision Azure Resources
Azure DevOps Project Sidebar -> Pipelines -> Create Pipeline
- Where is your code?:
Azure Repos Git
- Select your imported repo
- Configure your pipeline:
Existing Azure Pipelines YAML file
- Branch:
master
- Path:
/build_pipeline_scripts/iac-create-environment.yml
- Branch:
- Suggested Name:
Provision Azure Environment
- Where is your code?:
- Run the Build
Newly Created Build Pipeline -> Run
(This will take a few minutes!)
- Verify Azure Resource Creation
Congratulations! You have configured the large majority of the Azure resources necessary for this workshop.
- Create a Blob Container and Load Log Data
Azure Portal -> Storage Accounts -> <baseName>amlsa -> Blob Service -> Containers > + Container
- Name:
modeldata
- Public access level:
Private (no anonymous access)
- Name:
- Select the newly created
modeldata
container- Download
log_data.pkl
- Upload (top left)
- File:
log_data.pkl
that you just downloaded
- File:
- Download
- Capture Blob Storage Variable Group Entries
Azure DevOps Project -> Sidebar -> Pipelines -> Library -> Variable Groups -> devopsforai-aml-vg
-
Add the following variables:
Variable Name Suggested Value STORAGE_ACCT_NAME <baseName>amlsa
STORAGE_ACCT_KEY <Azure Portal -> Storage Accounts -> <Your Storage Account> -> Settings -> Access Keys>
STORAGE_BLOB_NAME modeldata
-
- Create an Azure DevOps Pipeline to Create the ML Pipeline
Azure DevOps Project Sidebar -> Pipelines -> Pipelines -> New pipeline
- Where is your code?:
Azure Repos Git
- Select your imported repo
- Configure your pipeline:
Existing Azure Pipelines YAML file
- Branch:
master
- Path:
/build_pipeline_scripts/model-build.yml
- Branch:
- Suggested Name:
Build ML Pipeline
- Where is your code?:
- Run the Pipeline
Congratulations! You have created an Azure ML Pipeline. We will train the pipeline in the next section.
- Create a New Azure DevOps Release
Azure DevOps Project Sidebar -> Pipelines -> Releases -> New pipeline
- Start With:
Empty Job
- Stage Name:
Run Train Scripts
- Stage Name:
- Start With:
- Update name to
Train ML Pipeline
- Add an Artifact
- Link the Variable Group
Train ML Pipeline Release -> Variables Tab -> Variable groups
- Link Variable Group:
devopsforai-aml-vg
- Update to Ubuntu Agent
- Navigate into Tasks for the Release
- Select the "Agent Job"
- Update Agent Specification:
ubuntu-16.04
- Add Command Line Task
- Click the "+" (Add task to an Agent Job) button in the
Agent Job
item - Add a
Command line
task - Select the new
Command Line Script
task and set the following- Display Name:
Run Train Models Script
- Script:
docker run -v $(System.DefaultWorkingDirectory)/_model-build/mlops-pipelines/python_scripts/:/script \ -w=/script -e MODEL_NAME=$MODEL_NAME -e EXPERIMENT_NAME=$EXPERIMENT_NAME \ -e TENANT_ID=$TENANT_ID -e SP_APP_ID=$SP_APP_ID -e SP_APP_SECRET=$SP_APP_SECRET \ -e SUBSCRIPTION_ID=$SUBSCRIPTION_ID -e RELEASE_RELEASEID=$RELEASE_RELEASEID \ -e BUILD_BUILDID=$BUILD_BUILDID -e BASE_NAME=$BASE_NAME \ -e STORAGE_ACCT_NAME=$STORAGE_ACCT_NAME -e STORAGE_ACCT_KEY=$STORAGE_ACCT_KEY -e STORAGE_BLOB_NAME=$STORAGE_BLOB_NAME \ mcr.microsoft.com/mlops/python:latest python run_train_pipeline.py
- Display Name:
- Click the "+" (Add task to an Agent Job) button in the
- Run the Release
- Verify that models are created (this may take a while!)
- Verify that models exist in blob storage
Congratulations! You've trained your models. We will scale the AKS cluster from the start of the workshop in the next section.
- Capture AKS Variable Group Entries
Azure DevOps Project -> Sidebar -> Pipelines -> Library -> Variable Groups -> devopsforai-aml-vg
-
Add the following variables:
Variable Name Suggested Value AKS_NAME atDevDayCluster
AKS_RG <baseName>-AML-RG
-
- Create a new Azure DevOps Release to run the ML Scaler
Azure DevOps Project Sidebar -> Pipelines -> Releases -> Train ML Pipeline -> ... in top right -> Clone
- Name:
Run ML Scaler
- Update the
Command Line Script
task to look like the following:- Display Name:
Run ML Scaler Script
- Script:
docker run -v $(System.DefaultWorkingDirectory)/_model-build/mlops-pipelines/python_scripts/:/script \ -w=/script -e SP_APP_ID=$SP_APP_ID -e SP_APP_SECRET=$SP_APP_SECRET -e TENANT_ID=$TENANT_ID \ -e AKS_RG=$AKS_RG -e STORAGE_ACCT_NAME=$STORAGE_ACCT_NAME -e STORAGE_ACCT_KEY=$STORAGE_ACCT_KEY \ -e STORAGE_BLOB_NAME=$STORAGE_BLOB_NAME -e CONTANER_NAME=$CONTANER_NAME -e AKS_NAME=$AKS_NAME \ markschabacker/at_ml_dev_day:latest python AksResourceController.py
- The docker image definition used in the script is available in the
docker/container
folder in this repo.
- Display Name:
- Name:
- Save and run the
Run ML Scaler
release- This should take a while (5+ minutes)!
- Verify Scaling
Congratulations! You've used Machine Learning to proactively scale an Azure Kubernetes Service cluster!