It is a recommended pattern for enterprise applications to automate platform provisioning to achieve consistent, repeatable deployments using Infrastructure as Code (IaC). This practice is highly encouraged by organizations that run multiple environments such as Dev, Test, Performance Test, UAT, Blue and Green production environments, etc. IaC is also very effective in managing deployments when the production environments are spread across multiple regions across the globe.
Tools like Azure Resource Manager (ARM), Terraform, and the Azure Command Line Interface (CLI) enable you to declaratively script the cloud infrastructure and use software engineering practices such as testing and versioning while implementing IaC.
This sample will focus on automating the provisioning of a basic Azure Databricks environment using the Infrastructure as Code pattern
The following list captures the scope of this sample:
- Provision an Azure Databricks environment using ARM templates orchestrated by a shell script.
- The following services will be provisioned as a part of the basic Azure Databricks environment setup:
- Azure Databricks Workspace
- Azure Storage account with hierarchical namespace enabled to support ABFS
- Azure key vault to store secrets and access tokens
Details about how to use this sample can be found in the later sections of this document.
The below diagram illustrates the deployment process flow followed in this sample:
Following are the cloud design patterns being used by this sample:
- External Configuration Store pattern: Configuration for the deployment is persisted externally as a parameter file separate from the deployment script.
- Federated Identity pattern: Azure active directory is used as the federated identity store to enable seamless integration with enterprise identity providers.
- Valet Key pattern: Azure key vault is used to manage the secrets and access toked used by the services.
- Compensating Transaction pattern: The script will roll back partially configured resources if the deployment is incomplete.
The following technologies are used to build this sample:
This section holds the information about usage instructions of this sample.
The following are the prerequisites for deploying this sample :
- Github account
- Azure Account
-
Permissions needed: The ability to create and deploy to an Azure resource group, a service principal, and grant the collaborator role to the service principal over the resource group.
-
Active subscription with the following resource providers enabled:
- Microsoft.Databricks
- Microsoft.DataLakeStore
- Microsoft.Storage
- Microsoft.KeyVault
-
- Azure CLI installed on the local machine
- Installation instructions can be found here
- For Windows users,
- Option 1: Windows Subsystem for Linux
- Option 2: Use the devcontainer published here as a host for the bash shell. For more information about Devcontainers, see here.
IMPORTANT NOTE: As with all Azure Deployments, this will incur associated costs. Remember to teardown all related resources after use to avoid unnecessary costs. See here for a list of deployed resources.
Below listed are the steps to deploy this sample :
-
Fork and clone this repository. Navigate to (CD)
single_tech_samples/databricks/sample1_basic_azure_databricks_environment/
. -
The sample depends on the following environment variables to be set before the deployment script is run:
DEPLOYMENT_PREFIX
- Prefix for the resource names which will be created as a part of this deploymentAZURE_SUBSCRIPTION_ID
- Subscription ID of the Azure subscription where the resources should be deployed.AZURE_RESOURCE_GROUP_NAME
- Name of the containing resource groupAZURE_RESOURCE_GROUP_LOCATION
- Azure region where the resources will be deployed. (e.g. australiaeast, eastus, etc.)DELETE_RESOURCE_GROUP
- Flag to indicate the cleanup step for the resource group
-
Run '/deploy.sh'
Note: The script will prompt you to log in to the Azure account for authorization to deploy resources.
The script will validate the ARM templates and the environment variables before deploying the resources. It will also display the status of each stage of the deployment while it executes. The following screenshot displays the log for a successful run:
Note:
DEPLOYMENT_PREFIX
for this deployment was set aslumustest
The following resources will be deployed as a part of this sample once the script is executed:
1.Azure Databricks workspace.
2.Azure Storage with hierarchical namespace enabled.
2.Azure Key vault with all the secrets configured.
The following steps can be performed to validate the correct deployment of this sample:
-
Users with appropriate access rights should be able to:
Please follow the below steps to clean up your environment :
The clean-up script can be executed to clean up the resources provisioned in this sample. Following are the steps to execute the script:
-
Navigate to (CD)
single_tech_samples/databricks/sample1_basic_azure_databricks_environment/
. -
Run '/destroy.sh'
The following screenshot displays the log for a successful clean-up run: