Skip to content

Commit

Permalink
Sample updates for AML Spark managed VNet GA. (#2656)
Browse files Browse the repository at this point in the history
* Sample updates for AML Spark managed VNet GA.

* Updated notebook title. Triggering build to test.

* Updated workspace name to include timestamp. Triggering build to test.

* Shortened VNet workspace name to meet validation.

* README.md clean-up and fixes for AML Spark samples.

* README.md clean-up and fixes for AML Spark samples.
  • Loading branch information
ynpandey authored Sep 25, 2023
1 parent 8eea7ca commit e9470ab
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 15 deletions.
28 changes: 18 additions & 10 deletions cli/jobs/spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ You can execute the above command from:
- terminal of [Visual Studio Code connected to an Azure Machine Learning compute instance](https://learn.microsoft.com/azure/machine-learning/how-to-set-up-vs-code-remote?tabs=studio).
- your local computer that has [Azure Machine Learning CLI](https://learn.microsoft.com/azure/machine-learning/how-to-configure-cli?tabs=public) installed.

## Attach user assigned managed identity to a workspace
## Attach user-assigned managed identity to a workspace
The managed identity used by serverless Spark compute is user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to a workspace either using CLI v2 or using `ARMClient`.

### Attach user assigned managed identity using CLI v2
### Attach user-assigned managed identity using CLI v2

1. Use `user-assigned-identity.yaml` file provided in this directory with the `--file` parameter in the `az ml workspace update` command to attach the user assigned managed identity:
```azurecli
Expand All @@ -24,33 +24,41 @@ The managed identity used by serverless Spark compute is user-assigned managed i
1. Use `user-assigned-identity.json` file provided in this directory to execute the following command in the PowerShell prompt or the command prompt, to attach the user-assigned managed identity to the workspace.
```cmd
armclient PATCH https://management.azure.com/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>?api-version=2022-05-01 '@user-assigned-identity.json'
```
## Provision Managed VNet for Serverless Spark (Preview)
## Provision Managed VNet for Serverless Spark
To provision managed VNet for serverless Spark:
1. Create a workspace using parameter `--managed-network allow_internet_outbound`. To allows only approved outbound communications, use either `--managed-network allow_only_approved_outbound`:
1. Create a workspace using parameter `--managed-network allow_internet_outbound`:
```azurecli
az ml workspace create --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --location <AZURE_REGION_NAME> --name <AML_WORKSPACE_NAME> --managed-network allow_internet_outbound
```
1. Once workspace is created update it to define outbound rule to add a Private Endpoint connection to a storage account use the file `storage_pe.yaml` provided in this directory with `--file` parameter:
If you want to allow only approved outbound traffic to enable data exfiltration protection (DEP), use `--managed-network allow_only_approved_outbound`:
```azurecli
az ml workspace create --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --location <AZURE_REGION_NAME> --name <AML_WORKSPACE_NAME> --managed-network allow_only_approved_outbound
```
2. Once workspace is created update it to define outbound rules. To add a Private Endpoint connection to a storage account, use the file `storage_pe.yaml` provided in this directory with `--file` parameter:
> [!NOTE]
> If you used parameter `--managed-network allow_only_approved_outbound` in the previous CLI command, edit `storage_pe.yaml` to define `isolation_mode: allow_only_approved_outbound`. A workspace created with `isolation_mode: allow_internet_outbound` can not be updated later to use `isolation_mode: allow_only_approved_outbound`.
```azurecli
az ml workspace update --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --name <AML_WORKSPACE_NAME> --file storage_pe.yaml
```
1. Provision managed VNet for serverless Spark compute. This command will also provision the Private Endpoints defines at the previous step:
3. Provision managed VNet for serverless Spark compute. This command will also provision the Private Endpoints defined in previous step:
```azurecli
az ml workspace provision-network --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --name <AML_WORKSPACE_NAME> --include-spark
```
> NOTE
> If the Azure Machine Learning workspace and storage account are in different resource groups, then Private Endpoints needs to be manually activated in [Azure portal](https://portal.azure.com) before accessing data from the storage account in Spark jobs.
> If the Azure Machine Learning workspace and storage account are in different resource groups, then Private Endpoints need to be manually activated in [Azure portal](https://portal.azure.com) before accessing data from the storage account in Spark jobs.
1. To see a list of outbound rules, execute the following command:
4. To see a list of outbound rules, execute the following command:
```azurecli
az ml workspace outbound-rule list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
```
1. To show details of an outbound rule, execute the following command:
5. To show details of an outbound rule, execute the following command:
```azurecli
az ml workspace outbound-rule show --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --rule <OUTBOUND_RULE_NAME>
```
1. To remove an outbound rule, execute the following command:
6. To remove an outbound rule, execute the following command:
```azurecli
az ml workspace outbound-rule remove --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --rule <OUTBOUND_RULE_NAME>
```
Expand Down
3 changes: 2 additions & 1 deletion sdk/python/jobs/spark/setup_spark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ AML_USER_MANAGED_ID_OID=$(az identity show --resource-group $RESOURCE_GROUP -n $
#<setup_vnet_resources>
if [[ "$2" == *"managed_vnet"* ]]
then
AML_WORKSPACE_NAME=${AML_WORKSPACE_NAME}-vnet
TIMESTAMP=`date +%H%M%S`
AML_WORKSPACE_NAME=${AML_WORKSPACE_NAME}-vnet-$TIMESTAMP
AZURE_STORAGE_ACCOUNT="blobstoragevnet"
BLOB_CONTAINER_NAME="blobstoragevnetcontainer"
GEN2_STORAGE_ACCOUNT_NAME="gen2storagevnet"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
}
},
"source": [
"# Azure Machine Learning Serverless Spark with Managed VNet (Preview)\n",
"This Notebook provides sample codes for running a Spark job using [Azure Machine Learning serverless Spark compute with a managed VNet (preview)](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-managed-network#configure-for-serverless-spark-jobs). In this sample notebook you will:\n",
"# Azure Machine Learning Serverless Spark with Managed Virtual Network\n",
"This Notebook provides sample codes for running a Spark job using [Azure Machine Learning serverless Spark compute with a managed virtual network](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-managed-network#configure-for-serverless-spark-jobs). In this sample notebook you will:\n",
"- Create an Azure Machine Learning Workspace with Public Network Access _disabled_.\n",
"- Configure outbound rules for the Azure Machine Learning workspace that allow storage account data access.\n",
"- Provision managed network for the workspace.\n",
Expand Down Expand Up @@ -55,7 +55,6 @@
"source": [
"# import required libraries\n",
"from azure.ai.ml import MLClient\n",
"from azure.ai.ml.entities import Workspace\n",
"from azure.identity import DefaultAzureCredential\n",
"\n",
"# Enter the details of your subscription\n",
Expand All @@ -78,7 +77,10 @@
},
"source": [
"### Create a Workspace\n",
"Define a managed VNet with isolation mode `AllowInternetOutbound` and a user-defined outbound rule for Azure Blob storage account. In this example, Public Network Access to the workspace is _disabled_. The code in cell creates the workspace, but the managed VNet and Private Endpoints corresponding to the outbound rules are provisioned in the later step.\n",
"Define a managed VNet with isolation mode `IsolationMode.ALLOW_INTERNET_OUTBOUND` and a user-defined outbound rule for Azure Blob storage account. In this example, Public Network Access to the workspace is _disabled_. The code in cell creates the workspace, but the managed VNet and Private Endpoints corresponding to the outbound rules are provisioned in the later step.\n",
"\n",
"> [!NOTE]\n",
"> If you want to allow only approved outbound traffic to enable data exfiltration protection (DEP), use `IsolationMode.ALLOW_ONLY_APPROVED_OUTBOUND`.\n",
"\n",
"If the Azure Blob storage account needs to have Public Network Access _disabled_, then access should be disabled before adding the outbound rule and provisioning the managed VNet for the workspace. "
]
Expand Down

0 comments on commit e9470ab

Please sign in to comment.