Name		Name	Last commit message	Last commit date
parent directory ..
.devcontainer		.devcontainer
devops_pipelines		devops_pipelines
environment_setup		environment_setup
local_development		local_development
ml_model		ml_model
ml_service		ml_service
tests/ml_service/util		tests/ml_service/util
.flake8		.flake8
.isort.cfg		.isort.cfg
README.md		README.md

README.md

Overview

The purpose of this sample is to demonstrate how to use Azure Machine Learning which requires additional software and dependencies, which could be non-python code, such as C++ source code, bash files and some binaries.

What does this sample demonstrate:

Run kaldi asr toolkit yesno sample in Azure Machine Learning pipeline.
Create tests by mocking Azure ML SDK.

What doesn't this sample demonstrate:

ML model deployment.

Run kaldi sample in Azure ML pipeline

Additional software for Azure ML pipeline

To run Kaldi sample, you need to have Kaldi ASR Toolkit and its dependencies in compute cluster. Azure ML provides out of box environments as base image when you run Azure ML pipeline, but non of them include Kaldi ASR Toolkit.

Whenever you need additional software, you can create custom docker image and use it as custom base image. This sample demonstrate how you can provision and use custom image in Azure ML pipeline.

Python wrapper for Azure ML pipeline

The yesno sample uses bash script to train the model. However, Azure ML pipeline requires python code to execute steps. This sample demonstrate how you can write python wrapper code to run underline shell script as part of Azure ML pipeline step.

Getting Started

Prerequisites
Running locally

Prerequisites

Create Azure Resources
Build and push custom base image
Prepare input data
Add Azure ML compute, datastore and datasets

Create Azure Resources

Whether you run this project locally or in Azure DevOps CI/CD pipelines, the code needs to get Azure ML context for remote or offline runs. Create Azure resources as documented here.
Review the folder structure explained here.

Build and push custom base image

This sample requires Kaldi ASR Toolkit which is not avaiable for out of box Azure Machine Learning Environments. You can create custom docker images which contains all requires dependencies by yourself nad use it as Azure Machine Learning Environment. See Create & use software environments in Azure Machine Learning for more detail.

This sample contains Dockerfile to build a custom base image which contains Kaldi ASR Toolkit and dependencies. Follow the steps below to build the image and push it to container registory. We recommend you to run separate VM which contains docker engine to build the image as it may take 30-60 minutes depending on specs.

Download SRILM from http://www.speech.sri.com/projects/srilm/download.html into environment_setup/azureml_environment folder and rename it to srilm.tar.gz.
Move to a directory which contains Dockerfile and build the docker image. This may take hours depending on computer spec. The Dockerfile expects srilm.tar.gz file exists in the directory.
```
docker build -t <your_acr_name>/<base_image_name>:<tag_name> .
```
Once the build has been completed, login to Azure Container Registry which you provisioned earlier.
```
docker login -u <your_acr_username> -p <your_acr_password> <your_acr_login_server>
```

Then push the image to the registry.

docker push <your_acr_name>/<base_image_name>:<tag_name>

Once the image has been pushed, update "image" section of devcontainer.json file.

Prepare input data

This sample takes input and wave files from Azure Machine Learning Dataset which mapped to Azure Blob Storage.

Create blob container and folders

Go to Azure Storage Account and create new container named 'azureml'

Upload input data

Download files from https://github.com/kaldi-asr/kaldi/tree/master/egs/yesno/s5/input
Upload the downloaded files by adding 'input' directory in 'azureml' container.

Upload wave data

Download file from http://www.openslr.org/resources/1/waves_yesno.tar.gz and extract the downloaded file.
Upload all audio files by adding 'waves' directory in 'azureml' container.

As the sample obtain wave files from Azure Machine Learning Dataset, we commentted out run.sh where it downloads wave files.

Add Azure ML compute, datastore and datasets

After you have all Azure resources and input data in Azure Storage, you need to create following Azure Machine Learning components.

Azure Machine Learning compute
Azure Machine Learning datastore and datasets

Change directory to samples/kaldi-asr-yesno.
```
cd samples/kaldi-asr-yesno
```

Run following command to create compute.

python -m environment_setup.provisioning.create-compute

Run following command to create datastore and dataset.

python -m environment_setup.provisioning.create-datastore

Running locally

Make a copy of .env.example, place it in the root of this sample, configure the variables, and rename the file to .env.

Update variable values.

name	description
SUBSCRIPTION_ID	Azure Subscription ID
RESOURCE_GROUP	Azure Resource group name
WORKSPACE_NAME	Azure Machine Learning workspace name
AML_ENV_NAME	Azure Machine Learning Environment name
AML_COMPUTE_CLUSTER_NAME	Azure Machine Learning compute cluster name
AML_BLOB_DATASTORE_NAME	Azure Machine Learning blob datastore name
AML_STORAGE_ACCOUNT_NAME	Azure Storage Account name for Azure Machine Learning blob datastore
AML_BLOB_CONTAINER_NAME	Blob container name which contains input data
AML_STORAGE_ACCOUNT_KEY	Azure Storage Account Key
PIPELINE_ENDPOINT_NAME	Azure Machine Learning pipeline endpoint name
PIPELINE_NAME	Azure Machine Learning pipeline name
AML_INPUT_DATASET_NAME	input dataset name which is used by yesno sample. Don't change this value.
AML_WAVES_DATASET_NAME	waves dataset name which is used by yesno sample. Don't change this value.
SOURCES_DIR_TRAIN	source code directory for Azure Machine Learning pipeline
FIRST_STEP_SCRIPT_PATH	python script path for the first step
ACR_IMAGE	Custome base image name in Azure Container Registory
ACR_ADDRESS	Azure Container Registory address
ACR_USERNAME	Azure Container Registory user name
ACR_PASSWORD	Azure Container Registory user password

Use the VSCode dev container, or install Anaconda or Mini Conda and create a Conda envrionment by running local_install_requirements.sh.
In VSCode, open the root folder of this sample, select the Conda environment created above as the Python interpretor.
Publish and run Azure ML pipelines.
- To run the unit tests, open a terminal, activate the Conda environment for this sample, navigate to the root folder of this project, run
```
python -m pytest 
```
- To publish and run Azure ML pipelines, run:
```
# publish the Azure ML pipeline
python -m ml_service.pipelines.build_pipeline
```

CI/CD in Azure DevOps

This sample contains Azure DevOps pipeline yaml files in devops_pipelines folder.

To use Azure DevOps pipeline, follow the steps below.

Create Service Connection for Azure Resourece Group.
Update values for yesno-variables-template.yml. Some variables are missing compare to .env file, as those values comes from Azure DevOps pipeline group and KeyVault.
Create aml-storage-account-key and acr-password as KeyVault secrets and save the corresponding values.
Create Azure pipeline by specifing yesno-ci.yml.

Linting and Testing

Flake8

This sample uses Flake8 as linting tool. Ideally we should do linting for all python code, however we exclude Kaldi sample source code as it comes from another repo. This happens a lot in real project that some code comes from outside of the project and you don't want to modify the code.

See .flake8 for rule settings.

Pytest

This sample uses pytest for unit testing python code. We only test our code, and exlude kaldi sample source code.

test_pipeline_utils.py demonstrate how to mock Azure Machine Learning SDK and write unit test code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kaldi-asr-yesno

kaldi-asr-yesno

README.md

Overview

Run kaldi sample in Azure ML pipeline

Additional software for Azure ML pipeline

Python wrapper for Azure ML pipeline

Getting Started

Prerequisites

Create Azure Resources

Build and push custom base image

Prepare input data

Create blob container and folders

Upload input data

Upload wave data

Add Azure ML compute, datastore and datasets

Running locally

CI/CD in Azure DevOps

Linting and Testing

Flake8

Pytest

Files

kaldi-asr-yesno

Directory actions

More options

Directory actions

More options

Latest commit

History

kaldi-asr-yesno

Folders and files

parent directory

README.md

Overview

Run kaldi sample in Azure ML pipeline

Additional software for Azure ML pipeline

Python wrapper for Azure ML pipeline

Getting Started

Prerequisites

Create Azure Resources

Build and push custom base image

Prepare input data

Create blob container and folders

Upload input data

Upload wave data

Add Azure ML compute, datastore and datasets

Running locally

CI/CD in Azure DevOps

Linting and Testing

Flake8

Pytest