Deploying a secure Azure Databricks environment using Infrastructure as Code

1. Solution Overview

It is a recommended pattern for enterprise applications to automate platform provisioning to achieve consistent, repeatable deployments using Infrastructure as Code (IaC). This practice is highly encouraged by organizations that run multiple environments such as Dev, Test, Performance Test, UAT, Blue and Green production environments, etc. IaC is also very effective in managing deployments when the production environments are spread across multiple regions across the globe.

Tools like Azure Resource Manager (ARM), Terraform, and the Azure Command Line Interface (CLI) enable you to declaratively script the cloud infrastructure and use software engineering practices such as testing and versioning while implementing IaC.

This sample will focus on automating the provisioning of a basic Azure Databricks environment using the Infrastructure as Code pattern

1.1. Scope

The following list captures the scope of this sample:

Provision an Azure Databricks environment using ARM templates orchestrated by a shell script.
The following services will be provisioned as a part of the basic Azure Databricks environment setup:
1. Azure Databricks Workspace
2. Azure Storage account with hierarchical namespace enabled to support ABFS
3. Azure key vault to store secrets and access tokens

Details about how to use this sample can be found in the later sections of this document.

1.2. Architecture

The below diagram illustrates the deployment process flow followed in this sample:

1.2.1. Patterns

Following are the cloud design patterns being used by this sample:

External Configuration Store pattern: Configuration for the deployment is persisted externally as a parameter file separate from the deployment script.
Federated Identity pattern: Azure active directory is used as the federated identity store to enable seamless integration with enterprise identity providers.
Valet Key pattern: Azure key vault is used to manage the secrets and access toked used by the services.
Compensating Transaction pattern: The script will roll back partially configured resources if the deployment is incomplete.

1.3. Technologies used

The following technologies are used to build this sample:

Azure Databricks
Azure Storage
Azure Key Vault
Azure CLI
Azure Resource Manager

2. How to use this sample

This section holds the information about usage instructions of this sample.

2.1. Prerequisites

The following are the prerequisites for deploying this sample :

Github account
Azure Account
- Permissions needed: The ability to create and deploy to an Azure resource group, a service principal, and grant the collaborator role to the service principal over the resource group.
- Active subscription with the following resource providers enabled:
  - Microsoft.Databricks
  - Microsoft.DataLakeStore
  - Microsoft.Storage
  - Microsoft.KeyVault

2.1.1 Software Prerequisites

Azure CLI installed on the local machine
- Installation instructions can be found here
For Windows users,
1. Option 1: Windows Subsystem for Linux
2. Option 2: Use the devcontainer published here as a host for the bash shell. For more information about Devcontainers, see here.

2.2. Setup and deployment

IMPORTANT NOTE: As with all Azure Deployments, this will incur associated costs. Remember to teardown all related resources after use to avoid unnecessary costs. See here for a list of deployed resources.

Below listed are the steps to deploy this sample :

Fork and clone this repository. Navigate to (CD) single_tech_samples/databricks/sample1_basic_azure_databricks_environment/.
The sample depends on the following environment variables to be set before the deployment script is run:
- DEPLOYMENT_PREFIX - Prefix for the resource names which will be created as a part of this deployment
- AZURE_SUBSCRIPTION_ID - Subscription ID of the Azure subscription where the resources should be deployed.
- AZURE_RESOURCE_GROUP_NAME - Name of the containing resource group
- AZURE_RESOURCE_GROUP_LOCATION - Azure region where the resources will be deployed. (e.g. australiaeast, eastus, etc.)
- DELETE_RESOURCE_GROUP - Flag to indicate the cleanup step for the resource group
Run '/deploy.sh'

Note: The script will prompt you to log in to the Azure account for authorization to deploy resources.

The script will validate the ARM templates and the environment variables before deploying the resources. It will also display the status of each stage of the deployment while it executes. The following screenshot displays the log for a successful run:

Note: DEPLOYMENT_PREFIX for this deployment was set as lumustest

2.3. Deployed Resources

The following resources will be deployed as a part of this sample once the script is executed:

1.Azure Databricks workspace.

2.Azure Storage with hierarchical namespace enabled.

2.Azure Key vault with all the secrets configured.

2.4. Deployment validation

The following steps can be performed to validate the correct deployment of this sample:

Users with appropriate access rights should be able to:
1. launch the workspace from the Azure portal.
2. Access the control plane for the storage account and key vault through the Azure portal.
3. View the secrets configured in the Azure Key vault
4. View deployment logs in the Azure resource group

2.5. Clean-up

Please follow the below steps to clean up your environment :

The clean-up script can be executed to clean up the resources provisioned in this sample. Following are the steps to execute the script:

Navigate to (CD) single_tech_samples/databricks/sample1_basic_azure_databricks_environment/.
Run '/destroy.sh'

The following screenshot displays the log for a successful clean-up run:

3. Next Step

Deploying Enterprise-grade Azure Databricks environment using Infrastructure as Code aligned with Anti-Data-Exfiltration Reference architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deploying a secure Azure Databricks environment using Infrastructure as Code

Contents

1. Solution Overview

1.1. Scope

1.2. Architecture

1.2.1. Patterns

1.3. Technologies used

2. How to use this sample

2.1. Prerequisites

2.1.1 Software Prerequisites

2.2. Setup and deployment

2.3. Deployed Resources

2.4. Deployment validation

2.5. Clean-up

3. Next Step

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deploying a secure Azure Databricks environment using Infrastructure as Code

Contents

1. Solution Overview

1.1. Scope

1.2. Architecture

1.2.1. Patterns

1.3. Technologies used

2. How to use this sample

2.1. Prerequisites

2.1.1 Software Prerequisites

2.2. Setup and deployment

2.3. Deployed Resources

2.4. Deployment validation

2.5. Clean-up

3. Next Step