diff --git a/CODEOWNERS b/CODEOWNERS
index e4255d5d9..3c0608290 100644
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -1,2 +1,9 @@
-# The aprl-admins team is responsible for reviewing and merging PRs
+# The aprl-admins team is responsible for reviewing and merging all PRs.
* @Azure/aprl-admins
+
+# The aprl-networking team is partially responsible for all networking-related PRs.
+/docs/content/services/networking/ @Azure/aprl-admins @Azure/aprl-networking
+
+# The aprl-hpc team is partially responsible for all HPC-related PRs.
+/docs/content/services/specialized-workloads/azure-hpc/ @Azure/aprl-admins @Azure/aprl-hpc
+/docs/content/services/batch/ @Azure/aprl-admins @Azure/aprl-hpc
diff --git a/README.md b/README.md
index 3809ba7c2..e064164cc 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,14 @@
# Azure Proactive Resiliency Library (APRL)
+> [!CAUTION]
+> The APRL repository is scheduled to be migrated to a new repository the week of April 8th.
+> The current APRL repository will be placed in **READ-ONLY** mode from April 8th to April 12th.
+> No new pull requests will be accepted after April 5th.
+>
+> **New Repository:** [https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2](https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2)
+>
+> **[aka.ms/aprl](https://aka.ms/aprl)** will redirect to the new website starting April 15th
+
[![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/Azure/Azure-Proactive-Resiliency-Library.svg)](http://isitmaintained.com/project/Azure/Azure-Proactive-Resiliency-Library "Average time to resolve an issue")
[![Percentage of issues still open](http://isitmaintained.com/badge/open/Azure/Azure-Proactive-Resiliency-Library.svg)](http://isitmaintained.com/project/Azure/Azure-Proactive-Resiliency-Library "Percentage of issues still open")
@@ -11,13 +20,13 @@ Welcome to the home of the Azure Proactive Resiliency Library (APRL).
This library is built with the intention of being a staging area for guidance and recommendations that can be used by customers, partners and the field in Well-Architected Framework reliability engagements/assessments; with the intent of the guidance and recommendations being promoted, once tested and validated with customers and partners, into the official [Well-Architected Framework documentation](https://aka.ms/waf).
-The library also contains supporting [Azure Resource Graph (ARG)](https://learn.microsoft.com/azure/governance/resource-graph/overview) queries, and sometimes [Azure PowerShell](https://learn.microsoft.com/powershell/azure/what-is-azure-powershell) or [Azure CLI](https://learn.microsoft.com/cli/azure/what-is-azure-cli) scripts, that can help customers, partners and the field identify resources that may or may not be compliant with the guidance and recommendations. The intent for these queries, in the long-term, is to make them part of the [Azure Advisor](https://learn.microsoft.com/azure/advisor/advisor-overview) service.
+The library also contains supporting [Azure Resource Graph (ARG)](https://learn.microsoft.com/azure/governance/resource-graph/overview) queries that can help customers, partners and the field identify resources that may or may not be compliant with the guidance and recommendations. The intent for these queries, in the long-term, is to make them part of the [Azure Advisor](https://learn.microsoft.com/azure/advisor/advisor-overview) service.
## Contributing
> The contribution guide can be found on the GitHub pages site here: [aka.ms/aprl/contribute](https://aka.ms/aprl/contribute)
-This project only currently accepts Pull Requests from Microsoft FTEs as of today. However, anyone is welcomed to create issues/features requests on the repo for the team to triage and action. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com).
+This project only currently accepts Pull Requests from Microsoft FTEs as of today. However, anyone is welcomed to create issues/features requests on the repo for the team to triage and action. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com).
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
diff --git a/docs/archetypes/service-bundle/_index.md b/docs/archetypes/service-bundle/_index.md
index 8525dc5e5..a6f5653e9 100644
--- a/docs/archetypes/service-bundle/_index.md
+++ b/docs/archetypes/service-bundle/_index.md
@@ -12,10 +12,10 @@ The presented resiliency recommendations in this guidance include {{ replace .Na
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Category | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
-| [CM-1 - CHANGE ME title](#cm-1---change-me-title) | Compatibility/Compliance/Disaster Recovery/High Availability/Management | High/Medium/Low | Preview/Verified | Yes |
-| [CM-2 - CHANGE ME title](#cm-2---change-me-title) | Monitoring/Networking/Performance/Scalability/Security/Storage | High/Medium/Low | Preview/Verified | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------|:-----------------------------------------------------------------------:|:---------------:|:----------------:|:-------------------:|
+| [CM-1 - CHANGE ME title](#cm-1---change-me-title) | Compatibility/Compliance/Disaster Recovery/High Availability/Management | High/Medium/Low | Preview/Verified | Yes |
+| [CM-2 - CHANGE ME title](#cm-2---change-me-title) | Monitoring/Networking/Performance/Scalability/Security/Storage | High/Medium/Low | Preview/Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -41,7 +41,7 @@ FILL ME IN...
- [CHANGE ME LINK](https://aka.ms)
- [CHANGE ME LINK](https://aka.ms)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -66,7 +66,7 @@ FILL ME IN...
- [CHANGE ME LINK](https://aka.ms)
- [CHANGE ME LINK](https://aka.ms)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/archetypes/service-bundle/code/cm-1/cm-1.azcli b/docs/archetypes/service-bundle/code/cm-1/cm-1.azcli
deleted file mode 100644
index 3e449c7e1..000000000
--- a/docs/archetypes/service-bundle/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-:: under-development
diff --git a/docs/archetypes/service-bundle/code/cm-1/cm-1.ps1 b/docs/archetypes/service-bundle/code/cm-1/cm-1.ps1
deleted file mode 100644
index 133b22465..000000000
--- a/docs/archetypes/service-bundle/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-# under-development
diff --git a/docs/archetypes/service-bundle/code/cm-2/cm-2.azcli b/docs/archetypes/service-bundle/code/cm-2/cm-2.azcli
deleted file mode 100644
index 3e449c7e1..000000000
--- a/docs/archetypes/service-bundle/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-:: under-development
diff --git a/docs/archetypes/service-bundle/code/cm-2/cm-2.ps1 b/docs/archetypes/service-bundle/code/cm-2/cm-2.ps1
deleted file mode 100644
index 133b22465..000000000
--- a/docs/archetypes/service-bundle/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-# under-development
diff --git a/docs/content/Privacy/_index.md b/docs/content/Privacy/_index.md
new file mode 100644
index 000000000..17f1e9c2a
--- /dev/null
+++ b/docs/content/Privacy/_index.md
@@ -0,0 +1,7 @@
++++
+title = "Privacy"
+description = "This Privacy Policy outlines how we collect and use your data when you interact with this website."
+weight = 4
++++
+
+We partner with Microsoft Clarity to capture how you use and interact with our website through behavioral metrics, heatmaps, and session replay to improve the content and usage of the website. Website usage data is captured using first and third-party cookies and other tracking technologies and is used for site optimization. For more information about how Microsoft collects and uses your data, visit the Microsoft Privacy Statement.
diff --git a/docs/content/_index.md b/docs/content/_index.md
index 319a81872..be7091b5c 100644
--- a/docs/content/_index.md
+++ b/docs/content/_index.md
@@ -4,6 +4,20 @@ description = "Welcome to the home of the Azure Proactive Resiliency Library (AP
weight = 1
+++
+{{< alert style="danger" >}}
+
+## WEBSITE MAINTENANCE NOTICE
+
+The APRL repository is scheduled to be migrated to a new repository the week of April 8th.
+The current APRL repository will be placed in READ-ONLY mode from April 8th to April 12th.
+No new pull requests will be accepted after April 5th.
+
+### New Repository: [https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2](https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2)
+
+### [aka.ms/aprl](https://aka.ms/aprl) will redirect to the new website starting April 15th
+
+{{< /alert >}}
+
Welcome to the home of the Azure Proactive Resiliency Library (APRL).
@@ -11,11 +25,11 @@ Welcome to the home of the Azure Proactive Resiliency Library (APRL).
This library is built with the intention of being a staging area for guidance and recommendations that can be used by customers, partners and the field in Well-Architected Framework reliability engagements/assessments; with the intent of the guidance and recommendations being promoted, once tested and validated with customers and partners, into the official [Well-Architected Framework documentation](https://aka.ms/waf).
-The library also contains supporting [Azure Resource Graph (ARG)](https://learn.microsoft.com/azure/governance/resource-graph/overview) queries, and sometimes [Azure PowerShell](https://learn.microsoft.com/powershell/azure/what-is-azure-powershell) or [Azure CLI](https://learn.microsoft.com/cli/azure/what-is-azure-cli) scripts, that can help customers, partners and the field identify resources that may or may not be compliant with the guidance and recommendations. The intent for these queries, in the long-term, is to make them part of the [Azure Advisor](https://learn.microsoft.com/azure/advisor/advisor-overview) service.
+The library also contains supporting [Azure Resource Graph (ARG)](https://learn.microsoft.com/azure/governance/resource-graph/overview) queries that can help customers, partners and the field identify resources that may or may not be compliant with the guidance and recommendations. The intent for these queries, in the long-term, is to make them part of the [Azure Advisor](https://learn.microsoft.com/azure/advisor/advisor-overview) service.
## Get Started
-To get started head over to the [Azure Services section]({{< ref "services/_index.md">}}) and then navigate via the appropriate category to find guidance, recommendations alongside supporting Azure Resource Graph queries, Azure PowerShell or Azure CLI scripts to help you discover compliant/non-compliant resources in your environment.
+To get started head over to the [Azure Services section]({{< ref "services/_index.md">}}) and then navigate via the appropriate category to find guidance, recommendations alongside supporting Azure Resource Graph queries to help you discover compliant/non-compliant resources in your environment.
{{< alert style="info" >}}
@@ -29,10 +43,10 @@ In APRL you will see a number of terms used, like Preview & Verified. The below
{{< table style="table-striped" >}}
-| Term | Definition |
-| ---- | ---------- |
-| Preview Guidance | Guidance that Microsoft FTEs have created based on customer engagements and is in the process of reviewing with the relevant Azure Product Group Engineering Service owners to ensure the content is valid and accurate |
-| Verified Guidance | Guidance has been signed off by Azure Product Group Engineering Service owners following their review |
+| Term | Definition |
+| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Preview Guidance | Guidance that Microsoft FTEs have created based on customer engagements and is in the process of reviewing with the relevant Azure Product Group Engineering Service owners to ensure the content is valid and accurate |
+| Verified Guidance | Guidance has been signed off by Azure Product Group Engineering Service owners following their review |
{{< /table >}}
diff --git a/docs/content/contributing/_index.md b/docs/content/contributing/_index.md
index 6425ffda2..7eb7933c7 100644
--- a/docs/content/contributing/_index.md
+++ b/docs/content/contributing/_index.md
@@ -9,6 +9,23 @@ Looking to contribute to the Azure Proactive Resiliency Library (APRL), well you
Follow the below instructions, especially the pre-requisites, to get started contributing to the library.
+## Writing a recommendation
+
+APRL recommendations are intended to enable and accelerate the delivery of Well Architected Reliability Assessments. The purpose of APRL is not to replace existing Azure public documentation and guidance on best practices.
+
+Each recommendation should be actionable for the customer. The customer should be able to place the recommendation in their backlog and the engineer that picks it up should have complete clarity on the change that needs to be made and the specific resources that the change should be made to.
+
+Each recommendation should include a descriptive title, a short guidance section that contains additional detail on the recommendation, links to public documentation that provide additional information related to the recommendation, and a query to identify resources that are not compliant with the recommendation. The title and guidance sections alone should provide sufficient information for a CSA to evaluate a resource.
+
+Recommendations should not require the CSA to spend a lot of time on background reading, they should not be open to interpretation, and they should not be vague. Remember that the CSA delivering the WARA is reviewing a large number of Azure resources in a limited amount of time and is not an expert in every Azure service.
+
+**Examples**
+
+- Good recommendation: Use a /24 subnet for the service
+- Bad recommendation: Size your subnet appropriately
+
+Not all best practices make good APRL recommendations. If the best practice relates to a particular service configuration and can be checked with an ARG query, it probably makes for a good APRL recommendation. If the best practice is more aligned to general architectural concepts that are true for many service or workload types, we very likely already have a recommendation in the APRL WAF section that addresses the topic. If not, consider adding a WAF recommendation to APRL. If neither is the case, APRL may not be the best location for this content.
+
## Context/Background
Before jumping into the pre-requisites and specific section contribution guidance, please familiarize yourself with this context/background on how this library is built to help you contribute going forward.
@@ -134,18 +151,15 @@ hugo new --kind service-bundle services/compute/virtual-machines
│ │ │
│ │ └───code
│ │ ├───cm-1
-│ │ │ cm-1.azcli
│ │ │ cm-1.kql
-│ │ │ cm-1.ps1
+│ │ │
│ │ │
│ │ └───cm-2
-│ │ cm-2.azcli
│ │ cm-2.kql
-│ │ cm-2.ps1
{{< /code >}}
4. Open `_index.md` in VS Code and make relevant changes
- You can copy the recommendations labelled `CM-1` or `CM-2` multiple times to create more recommendations
-5. Update Azure Resource Graph queries, PowerShell, AZCLI scripts in the `code` folder within `virtual-machines`
+5. Update Azure Resource Graph queries in the `code` folder within `virtual-machines`
- You will see there is a folder, e.g. `cm-1`, `cm-2`, per recommendation to help with file structure organization
6. Ensure you use the correct Azure resource abbreviations provided within our Cloud Adoption Framework (CAF) documentation [here](https://docs.microsoft.com/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations). For example, use `vm` for Virtual Machines.
7. Save, commit and push your changes to your branch and repo
@@ -160,6 +174,23 @@ Don't forget you can see your changes live by running a local copy of the APRL w
When creating recommendations for a service, please follow the below standards:
+### Recommendation categories
+
+Each recommendation should have _**one and only one**_ associated category from this list below.
+
+ | Recommendation Category | Category Description |
+ |:---:|:---:|
+ | Application Resilience | Ensures software applications remain functional under failures or disruptions. Utilizes fault-tolerance, stateless architecture, and microservices to maintain application health and reduce downtime. |
+ | Automation | Uses automated systems or scripts for routine tasks, backups, and recovery. Minimizes human intervention, thereby reducing errors and speeding up recovery processes. |
+ | Availability | Focuses on ensuring services are accessible and operational. Combines basic mechanisms like backups with advanced techniques like clustering and data replication to achieve near-zero downtime. (Includes High Availability) |
+ | Access & Security | Encompasses identity management, authentication, and security measures for safeguarding systems. Centralizes access control and employs robust security mechanisms like encryption and firewalls. (Includes Identity) |
+ | Governance | Involves policies, procedures, and oversight for IT resource utilization. Ensures adherence to legal, regulatory, and compatibility requirements, while guiding overall system management. (Includes Compliance and Compatibility) |
+ | Disaster Recovery | Involves strategies and technologies to restore systems and data after catastrophic failures. Utilizes off-site backups, recovery sites, and detailed procedures for quick recovery after a disaster. |
+ | System Efficiency | Maintains acceptable service levels under varying conditions. Employs techniques like resource allocation, auto-scaling, and caching to handle changes in load and maintain smooth operation. (Includes Performance and Scalability) |
+ | Monitoring | Involves constant surveillance of system health, performance, and security. Utilizes real-time alerts and analytics to identify and resolve issues quickly, aiding in faster response times. |
+ | Networking | Aims to ensure uninterrupted network service through techniques like failover routing, load balancing, and redundancy. Focuses on maintaining the integrity and availability of network connections. |
+ | Storage | Focuses on the integrity and availability of data storage systems. Employs techniques like RAID, data replication, and backups to safeguard against data loss or corruption. |
+
### Azure Resource Graph (ARG) Queries
1. All ARG queries should have two comments at the top of the query, one comment stating `Azure Resource Graph Query` and another comment providing a description of the query results returned. For example:
diff --git a/docs/content/services/ai-ml/databricks/_index.md b/docs/content/services/ai-ml/databricks/_index.md
index 0de8c571d..1817781ae 100644
--- a/docs/content/services/ai-ml/databricks/_index.md
+++ b/docs/content/services/ai-ml/databricks/_index.md
@@ -9,39 +9,40 @@ draft = false
The presented resiliency recommendations in this guidance include Azure Databricks and dependent resources and settings.
-## Summary of Recommendations
-
+## Summary of Recommendation
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [DBW-1 - Databricks runtime version is not latest and/or is not LTS version](#dbw-1---databricks-runtime-version-is-not-latest-or-is-not-lts-version) | Medium | Verified | No |
-| [DBW-2 - Use Databricks Pools](#dbw-2---use-databricks-pools) | High | Verified | No |
-| [DBW-3 - Use SSD backed VMs for Worker VM Type and Driver type](#dbw-3---use-ssd-backed-vms-for-worker-vm-type-and-driver-type) | Medium | Verified | No |
-| [DBW-4 - Enable autoscaling for batch workloads](#dbw-4---enable-autoscaling-for-batch-workloads) | High | Verified | No |
-| [DBW-5 - Enable autoscaling for SQL warehouse](#dbw-5---enable-autoscaling-for-sql-warehouse) | High | Verified | No |
-| [DBW-6 - Use Delta Live Tables enhanced autoscaling](#dbw-6---use-delta-live-tables-enhanced-autoscaling) | Medium | Verified | No |
-| [DBW-7 - Automatic Job Termination is enabled, ensure there are no user-defined local processes](#dbw-7---automatic-job-termination-is-enabled-ensure-there-are-no-user-defined-local-processes) | Medium | Verified | No |
-| [DBW-8 - Enable Logging-Cluster log delivery](#dbw-8---enable-logging-cluster-log-delivery) | Medium | Verified | No |
-| [DBW-9 - Use Delta Lake for higher reliability](#dbw-9---use-delta-lake-for-higher-reliability) | High | Verified | No |
-| [DBW-10 - Use Photon Acceleration](#dbw-10---use-photon-acceleration) | Low | Verified | No |
-| [DBW-11 - Automatically rescue invalid or nonconforming data with Databricks Auto Loader or Delta Live Tables](#dbw-11---automatically-rescue-invalid-or-nonconforming-data-with-databricks-auto-loader-or-delta-live-tables) | Low | Verified | No |
-| [DBW-12 - Configure jobs for automatic retries and termination](#dbw-12---configure-jobs-for-automatic-retries-and-termination) | High | Verified | No |
-| [DBW-13 - Use a scalable and production-grade model serving infrastructure](#dbw-13---use-a-scalable-and-production-grade-model-serving-infrastructure) | High | Verified | No |
-| [DBW-14 - Use a layered storage architecture](#dbw-14---use-a-layered-storage-architecture) | Medium | Verified | No |
-| [DBW-15 - Improve data integrity by reducing data redundancy](#dbw-15---improve-data-integrity-by-reducing-data-redundancy) | Low | Verified | No |
-| [DBW-16 - Actively manage schemas](#dbw-16---actively-manage-schemas) | Medium | Verified | No |
-| [DBW-17 - Use constraints and data expectations](#dbw-17---use-constraints-and-data-expectations) | Low | Verified | No |
-| [DBW-18 - Create regular backups](#dbw-18---create-regular-backups) | Low | Verified | No |
-| [DBW-19 - Recover from Structured Streaming query failures](#dbw-19---recover-from-structured-streaming-query-failures) | High | Verified | No |
-| [DBW-20 - Recover ETL jobs based on Delta time travel](#dbw-20---recover-etl-jobs-based-on-delta-time-travel) | Medium | Verified | No |
-| [DBW-21 - Use Databricks Workflows and built-in recovery](#dbw-21---use-databricks-workflows-and-built-in-recovery) | Low | Verified | No |
-| [DBW-22 - Configure a disaster recovery pattern](#dbw-22---configure-a-disaster-recovery-pattern) | High | Preview | No |
-| [DBW-23 - Automate deployments and workloads](#dbw-23---automate-deployments-and-workloads) | High | Preview | No |
-| [DBW-24 - Set up monitoring, alerting, and logging](#dbw-24---set-up-monitoring-alerting-and-logging) | High | Preview | No |
-| [DBW-25 - Deploy workspaces in separate Subscriptions](#dbw-25---deploy-workspaces-in-separate-subscriptions) | High | Preview | No |
-| [DBW-26 - Isolate each workspace in its own Vnet](#dbw-26---isolate-each-workspace-in-its-own-vnet) | High | Preview | No |
-| [DBW-27 - Do not Store any Production Data in Default DBFS Folders](#dbw-27---do-not-store-any-production-data-in-default-dbfs-folders) | High | Preview | No |
-| [DBW-28 - Do not use Azure Sport VMs for critical Production workloads](#dbw-28---do-not-use-azure-sport-vms-for-critical-production-workloads) | High | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------:|:------:|:--------:|:-------------------:|
+| [DBW-1 - Databricks runtime version is not latest and/or is not LTS version](#dbw-1---databricks-runtime-version-is-not-latest-or-is-not-lts-version) | Governance | Medium | Verified | No |
+| [DBW-2 - Use Databricks Pools](#dbw-2---use-databricks-pools) | System Efficiency | High | Verified | No |
+| [DBW-3 - Use SSD backed VMs for Worker VM Type and Driver type](#dbw-3---use-ssd-backed-vms-for-worker-vm-type-and-driver-type) | System Efficiency | Medium | Verified | No |
+| [DBW-4 - Enable autoscaling for batch workloads](#dbw-4---enable-autoscaling-for-batch-workloads) | System Efficiency | High | Verified | No |
+| [DBW-5 - Enable autoscaling for SQL warehouse](#dbw-5---enable-autoscaling-for-sql-warehouse) | System Efficiency | High | Verified | No |
+| [DBW-6 - Use Delta Live Tables enhanced autoscaling](#dbw-6---use-delta-live-tables-enhanced-autoscaling) | System Efficiency | Medium | Verified | No |
+| [DBW-7 - Automatic Job Termination is enabled, ensure there are no user-defined local processes](#dbw-7---automatic-job-termination-is-enabled-ensure-there-are-no-user-defined-local-processes) | Availability | Medium | Verified | No |
+| [DBW-8 - Enable Logging-Cluster log delivery](#dbw-8---enable-logging-cluster-log-delivery) | Monitoring | Medium | Verified | No |
+| [DBW-9 - Use Delta Lake for higher reliability](#dbw-9---use-delta-lake-for-higher-reliability) | Availability | High | Verified | No |
+| [DBW-10 - Use Photon Acceleration](#dbw-10---use-photon-acceleration) | Availability | Low | Verified | No |
+| [DBW-11 - Automatically rescue invalid or nonconforming data with Databricks Auto Loader or Delta Live Tables](#dbw-11---automatically-rescue-invalid-or-nonconforming-data-with-databricks-auto-loader-or-delta-live-tables) | Application Resilience | Low | Verified | No |
+| [DBW-12 - Configure jobs for automatic retries and termination](#dbw-12---configure-jobs-for-automatic-retries-and-termination) | Availability | High | Verified | No |
+| [DBW-13 - Use a scalable and production-grade model serving infrastructure](#dbw-13---use-a-scalable-and-production-grade-model-serving-infrastructure) | System Efficiency | High | Verified | No |
+| [DBW-14 - Use a layered storage architecture](#dbw-14---use-a-layered-storage-architecture) | Application Resilience | Medium | Verified | No |
+| [DBW-15 - Improve data integrity by reducing data redundancy](#dbw-15---improve-data-integrity-by-reducing-data-redundancy) | Application Resilience | Low | Verified | No |
+| [DBW-16 - Actively manage schemas](#dbw-16---actively-manage-schemas) | Governance | Medium | Verified | No |
+| [DBW-17 - Use constraints and data expectations](#dbw-17---use-constraints-and-data-expectations) | Application Resilience | Low | Verified | No |
+| [DBW-18 - Create regular backups](#dbw-18---create-regular-backups) | Disaster Recovery | Low | Verified | No |
+| [DBW-19 - Recover from Structured Streaming query failures](#dbw-19---recover-from-structured-streaming-query-failures) | Availability | High | Verified | No |
+| [DBW-20 - Recover ETL jobs based on Delta time travel](#dbw-20---recover-etl-jobs-based-on-delta-time-travel) | Disaster Recovery | Medium | Verified | No |
+| [DBW-21 - Use Databricks Workflows and built-in recovery](#dbw-21---use-databricks-workflows-and-built-in-recovery) | Disaster Recovery | Low | Verified | No |
+| [DBW-22 - Configure a disaster recovery pattern](#dbw-22---configure-a-disaster-recovery-pattern) | Disaster Recovery | High | Preview | No |
+| [DBW-23 - Automate deployments and workloads](#dbw-23---automate-deployments-and-workloads) | Automation | High | Preview | No |
+| [DBW-24 - Set up monitoring, alerting, and logging](#dbw-24---set-up-monitoring-alerting-and-logging) | Monitoring | High | Preview | No |
+| [DBW-25 - Deploy workspaces in separate Subscriptions](#dbw-25---deploy-workspaces-in-separate-subscriptions) | System Efficiency | High | Preview | No |
+| [DBW-26 - Isolate each workspace in its own Vnet](#dbw-26---isolate-each-workspace-in-its-own-vnet) | System Efficiency | High | Preview | No |
+| [DBW-27 - Do not Store any Production Data in Default DBFS Folders](#dbw-27---do-not-store-any-production-data-in-default-dbfs-folders) | Availability | High | Preview | No |
+| [DBW-28 - Do not use Azure Sport VMs for critical Production workloads](#dbw-28---do-not-use-azure-sport-vms-for-critical-production-workloads) | Availability | High | Preview | No |
+| [DBW-29 - Migrate Legacy Workspaces](#dbw-29---migrate-legacy-workspaces) | Availability | High | Preview | No |
+| [DBW-30 - Define alternate VM SKUs](#dbw-30---define-alternate-vm-skus) | System Efficiency | Medium | Preview | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -71,7 +72,7 @@ Use 12.2 LTS later. Databricks recommends that you migrate your workloads in the
- [Databricks runtime support lifecycles](https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/databricks-runtime-ver)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -95,7 +96,7 @@ Databricks pools are a standard feature of the service, pre-provisions VM’s in
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -109,7 +110,7 @@ Databricks pools are a standard feature of the service, pre-provisions VM’s in
**Category: System Efficiency**
-**Impact: Low**
+**Impact: Medium**
**Guidance**
@@ -125,7 +126,7 @@ Standard SSDs are acceptable for some Production workloads as well.
- [Azure managed disk types](https://learn.microsoft.com/azure/virtual-machines/disks-types#premium-ssd)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -151,7 +152,7 @@ For streaming workloads, Databricks recommends using Delta Live Tables with auto
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#enable-autoscaling-for-batch-workloadss)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -177,7 +178,7 @@ To handle more concurrent users for a given warehouse, increase the cluster coun
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#enable-autoscaling-for-sql-warehouse)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -195,13 +196,14 @@ To handle more concurrent users for a given warehouse, increase the cluster coun
**Guidance**
-A data disk is a managed disk that's attached to a virtual machine to store application data, or other data you need to keep. Data disks are registered as SCSI drives and are labeled with a letter that you choose. Hosting you data on a data disk also helps with flexibility when backuping or restoring data, as well as migrating the disk without having to migrate the entire Virtual Machine and Operating System. You will be able to also select a different disk sku, with different type, size, and performance that meet your requirements.
+Databricks enhanced autoscaling optimizes cluster utilization by automatically allocating cluster resources based on workload volume, with minimal impact on the data processing latency of your pipelines.
**Resources**
-- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
+- [Best practices for reliability](https://learn.microsoft.com/azure/databricks/lakehouse-architecture/reliability/best-practices)
+- [Databricks enhanced autoscaling](https://learn.microsoft.com/azure/databricks/delta-live-tables/settings#use-autoscaling-to-increase-efficiency-and-reduce-resource-usage)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -227,7 +229,7 @@ However, The auto termination feature monitors only Spark jobs, not user-defined
- [Best practices for reliability?](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -253,7 +255,7 @@ The destination of the logs depends on the cluster ID. If the specified destinat
- [Create a cluster](https://learn.microsoft.com/en-us/azure/databricks/clusters/configure#cluster-log-delivery)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -277,7 +279,7 @@ Delta Lake is an open source storage format that brings reliability to data lake
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -303,7 +305,7 @@ In the Databricks Lakehouse, Photon, a native vectorized engine entirely written
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#use-apache-spark-or-photon-for-distributed-compute)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -330,7 +332,7 @@ Invalid or nonconforming data can lead to crashes of workloads that rely on an e
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -355,7 +357,7 @@ Model serving provides a scalable and production-grade model real-time serving i
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -367,7 +369,7 @@ Model serving provides a scalable and production-grade model real-time serving i
### DBW-13 - Use a scalable and production-grade model serving infrastructure
-**Category: System EFficiency**
+**Category: System Efficiency**
**Impact: High**
@@ -380,7 +382,7 @@ Model serving provides a scalable and production-grade model real-time serving i
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -412,7 +414,7 @@ The final layer should only contain high-quality data and can be fully trusted f
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -436,7 +438,7 @@ Copying or duplicating data creates data redundancy and will lead to lost integr
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -463,7 +465,7 @@ Uncontrolled schema changes can lead to invalid data and failing jobs that use t
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -489,7 +491,7 @@ To further improve this handling, Delta Live Tables supports Expectations: Expec
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#use-constraints-and-data-expectations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -513,7 +515,7 @@ To recover from a failure, regular backups need to be available. The Databricks
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#create-regular-backups)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -537,7 +539,7 @@ Structured Streaming provides fault-tolerance and data consistency for streaming
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#recover-from-structured-streaming-query-failures)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -561,7 +563,7 @@ Despite thorough testing, a job in production can fail or produce some unexpecte
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices#recover-etl-jobs-based-on-delta-time-travel)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -585,7 +587,7 @@ Databricks Workflows are built for recovery. When a task in a multi-task job fai
- [Best practices for reliability](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/reliability/best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -609,7 +611,7 @@ A clear disaster recovery pattern is critical for a cloud-native data analytics
- [Azure Databricks Best Practices](https://github.com/Azure/AzureDatabricksBestPractices/tree/master)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -633,7 +635,7 @@ The Databricks Terraform provider manages Azure Databricks workspaces and the as
- [Best practices for operational excellence](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/operational-excellence/best-practices#2-automate-deployments-and-workloads)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -657,7 +659,7 @@ The Databricks Terraform provider manages Azure Databricks workspaces and the as
- [Best practices for operational excellence](https://learn.microsoft.com/en-us/azure/databricks/lakehouse-architecture/operational-excellence/best-practices#system-monitoring)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -681,7 +683,7 @@ Customers commonly partition workspaces based on teams or departments and arrive
- [Azure Databricks Best Practices](https://github.com/Azure/AzureDatabricksBestPractices/blob/master/toc.md#deploy-workspaces-in-multiple-subscriptions-to-honor-azure-capacity-limits)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -705,7 +707,7 @@ While you can deploy more than one Workspace in a VNet by keeping the associated
- [Azure Databricks Best Practices](https://github.com/Azure/AzureDatabricksBestPractices/blob/master/toc.md#consider-isolating-each-workspace-in-its-own-vnet)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -732,7 +734,7 @@ This recommendation is driven by security and data availability concerns. Every
- [Azure Databricks Best Practices](https://github.com/Azure/AzureDatabricksBestPractices/blob/master/toc.md#do-not-store-any-production-data-in-default-dbfs-foldersr)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -756,7 +758,7 @@ Azure Spot VMs are not recommended for critical production workloads that requir
- [Use Azure Spot Virtual Machines](https://learn.microsoft.com/en-us/azure/virtual-machines/spot-vms)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -765,3 +767,71 @@ Azure Spot VMs are not recommended for critical production workloads that requir
{{< /collapse >}}
+
+### DBW-29 - Migrate Legacy Workspaces
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Azure Databricks initially launched with shared control plane, where some regions shared control plane resources with another region. This shared control plane model then evolved to dedicated in-region control planes (e.g. North Europe, Central US, East US) to ensure a regional outage does not impact customer workspaces in other regions.
+
+Regions that now have their dedicated control plane have workspaces running in two configurations:
+
+- Legacy Workspaces - these are workspaces created before the dedicated control plane was available.
+- Workspaces - these are workspaces created after the dedicated control plane was available.
+
+The path for migrating legacy workspaces to use the in-region control plane is to **redeploy**.
+
+Review the list of network addresses used in each region in the Microsoft documentation and determine which regions are sharing a control plane. For example, we can look up Canada East in the table and see that the address for its SCC relay is "tunnel.canadacentral.azuredatabricks.net". Since the relay address is in Canada Central, we know that "Canada East" is using the control plane in another region.
+
+Some regions list two different addresses in the Azure Databricks Control plane networking table. For example, North Europe lists both "tunnel.westeurope.azuredatabricks.net" and "tunnel.northeuropec2.azuredatabricks.net" for the SCC relay address. This is because North Europe once shared the West Europe control plane, but it now has its own independent control plane. There are still some old, legacy workspaces in North Europe tied to the old control plane, but all workspaces created since the switch-over will be using the new control plane.
+
+Once a new Azure Databricks workspace is created, it should be configured to match the original legacy workspace. Databricks, Inc.
+recommends that customers use the Databricks Terraform Exporter for both the initial copy and for maintaining the workspace. However, this exporter is still in the experimental phase. For customers that do not trust experimental projects or for customers that do not want to use Terraform, they can use the "Migrate" tool that Databricks, Inc. maintains with GitHub. This is a collection of scripts that will export all of the objects (notebooks, cluster definitions, metadata, *etc.*) from one workspace and then import them to another workspace. Customers can use the "Migrate" tool to initially populate the new
+workspace and then use their CI/CD deployment process to keep the workspace in sync.
+
+Pro Tip: If you need to determine where the control plane is located for a particular Databricks workspace, you can use the "nslookup" console command on Windows or Linux with the workspace address. The result will tell you where the control plane is located.
+
+**Resources**
+
+- [Azure Databricks regions - IP addresses and domains](https://learn.microsoft.com/azure/databricks/resources/supported-regions#--ip-addresses-and-domains)
+- [Migrate - maintained by Databricks Inc.](https://github.com/databrickslabs/migrate)
+- [Databricks Terraform Exporter - maintained by Databricks Inc. (Experimental)](https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/experimental-exporter)
+
+
+
+### DBW-30 - Define alternate VM SKUs
+
+**Category: System Efficiency**
+
+**Impact: Medium**
+
+**Guidance**
+
+Azure Databricks availability planning should include plans for swapping VM SKUs based on capacity constraints.
+
+Azure Databricks creates its VMs as regional VMs and depends on Azure to choose the best availability zone for the VM. In the past, there have been rare instances where compute can not be allocated due to zonal or regional VM constraints. Thus, resulting in a "CLOUD PROVIDER" error.
+
+In these situations, customers have two options:
+
+- Use Databricks Pools. To manage costs, customers should be careful when selecting the size of their pools. They will have to pay for the Azure VMs even when they are idle in the pool. Databricks pool can contain only one SKU of VMs; you cannot mix multiple SKUs in the same pool. To reduce the number of pools that customers need to manage, they should settle on a few SKUs that will service their jobs instead of using a different VM
+SKU for each job.
+- Plan for alternative SKUs in their preferred region(s).
+
+**Resources**
+
+- [Compute configuration best practices](https://learn.microsoft.com/azure/databricks/compute/cluster-config-best-practices)
+- [GPU-enabled compute](https://learn.microsoft.com/azure/databricks/compute/gpu)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/dbw-30/dbw-30.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/container/container-registry/code/cr-9/cr-9.kql b/docs/content/services/ai-ml/databricks/code/dbw-29/dbw-29.kql
similarity index 100%
rename from docs/content/services/container/container-registry/code/cr-9/cr-9.kql
rename to docs/content/services/ai-ml/databricks/code/dbw-29/dbw-29.kql
diff --git a/docs/content/services/monitoring/log-analytics/code/log-3/log-3.kql b/docs/content/services/ai-ml/databricks/code/dbw-30/dbw-30.kql
similarity index 100%
rename from docs/content/services/monitoring/log-analytics/code/log-3/log-3.kql
rename to docs/content/services/ai-ml/databricks/code/dbw-30/dbw-30.kql
diff --git a/docs/content/services/batch/_index.md b/docs/content/services/batch/_index.md
new file mode 100644
index 000000000..3c147770b
--- /dev/null
+++ b/docs/content/services/batch/_index.md
@@ -0,0 +1,18 @@
++++
+title = "Batch"
+description = "Batch Services"
+date = 2023-03-21T10:12:16Z
+draft = false
++++
+
+This page lists all of the Azure Services under the Batch (High Performance Computing) category for which the APRL has guidance, recommendations and queries for.
+
+## Services List
+
+{{< alert style="info" >}}
+
+The below list of services is automatically populated based on the child folders and files in this directory within the source code in the repo.
+
+{{< /alert >}}
+
+{{< childpages >}}
diff --git a/docs/content/services/batch/batch-accounts/_index.md b/docs/content/services/batch/batch-accounts/_index.md
new file mode 100644
index 000000000..b1377b5aa
--- /dev/null
+++ b/docs/content/services/batch/batch-accounts/_index.md
@@ -0,0 +1,77 @@
++++
+title = "Batch Accounts"
+description = "Best practices and resiliency recommendations for Batch Accounts and associated resources and settings."
+date = "1/12/24"
+author = "lapate"
+msAuthor = "lapate"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Batch Accounts (Azure High Performance Computing) and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Impact | Design Area | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------|:------:|:------------:|:-------:|:-------------------:|
+| [BA-1 - Monitor Batch account quota](#ba-1---monitor-batch-account-quota) | Medium | Monitoring | Preview | No |
+| [BA-3 - Create an Azure Batch pool across Availability Zones](#ba-3---create-an-azure-batch-pool-across-availability-zones) | High | Availability | Preview | No |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+{{< /alert >}}
+
+## Recommendations Details
+
+### BA-1 - Monitor Batch Account quota
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+To enable Cross-region disaster recovery and business continuity, ensure that the appropriate quotas are set for all user subscription Batch accounts. This will allocate the required number of cores made available upfront. Without enough allocated cores capacity a job execution will be interrupted with operational errors indicating "Quota Reached".
+
+Pre-create all required services in each region, such as the Batch account and the storage account. There's often no charge for having accounts created, and charges accrue only when the account is used or when data is stored.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/azure/reliability/reliability-batch#cross-region-disaster-recovery-and-business-continuity)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ba-1/ba-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### BA-3 - Create an Azure Batch pool across Availability Zones
+
+**Category: Availability**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+When you create an Azure Batch pool using Virtual Machine Configuration, you can choose to provision your Batch pool across Availability Zones. Creating your pool with this zonal policy helps protect your Batch compute nodes from Azure datacenter-level failures.
+For example, you could create your pool with zonal policy in an Azure region that supports three Availability Zones. If an Azure datacenter in one Availability Zone has an infrastructure failure, your Batch pool will still have healthy nodes in the other two Availability Zones, so the pool will remain available for task scheduling.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/azure/batch/create-pool-availability-zones)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ba-3/ba-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/container/aks/code/aks-25/aks-25.kql b/docs/content/services/batch/batch-accounts/code/ba-1/ba-1.kql
similarity index 100%
rename from docs/content/services/container/aks/code/aks-25/aks-25.kql
rename to docs/content/services/batch/batch-accounts/code/ba-1/ba-1.kql
diff --git a/docs/content/services/networking/firewall/code/afw-7/afw-7.kql b/docs/content/services/batch/batch-accounts/code/ba-3/ba-3.kql
similarity index 100%
rename from docs/content/services/networking/firewall/code/afw-7/afw-7.kql
rename to docs/content/services/batch/batch-accounts/code/ba-3/ba-3.kql
diff --git a/docs/content/services/compute/compute-gallery/_index.md b/docs/content/services/compute/compute-gallery/_index.md
index 5df36a49c..78c06c9d5 100644
--- a/docs/content/services/compute/compute-gallery/_index.md
+++ b/docs/content/services/compute/compute-gallery/_index.md
@@ -12,11 +12,11 @@ The presented resiliency recommendations in this guidance include Compute Galler
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [CG-1 - A minimum of three replicas should be kept for production image versions](#cg-1---a-minimum-of-three-replicas-should-be-kept-for-production-image-versions) | Medium | Preview | Yes |
-| [CG-2 - Zone redundant storage should be used for image versions](#cg-2---zone-redundant-storage-should-be-used-for-image-versions) | Medium | Preview | Yes |
-| [CG-3 - Consider using hyper-V generation version 2 images where possible](#cg-3---consider-using-hyper-v-generation-version-2-images-where-possible) | Low | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------:|:------:|:--------:|:-------------------:|
+| [CG-1 - A minimum of three replicas should be kept for production image versions](#cg-1---a-minimum-of-three-replicas-should-be-kept-for-production-image-versions) | Availability | Medium | Verified | Yes |
+| [CG-2 - Zone redundant storage should be used for image versions](#cg-2---zone-redundant-storage-should-be-used-for-image-versions) | Availability | Medium | Verified | Yes |
+| [CG-3 - Consider creating TrustedLaunchSupported images where possible](#cg-3---consider-creating-trustedlaunchsupported-images-where-possible) | Availability | Low | Verified | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -41,7 +41,7 @@ Keep a minimum of 3 replicas for production images. In multi-VM deployment scen
- [Compute Gallery best practices](https://learn.microsoft.com/en-us/azure/virtual-machines/azure-compute-gallery#best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -67,7 +67,7 @@ You can also choose the account type for each of the target regions. The default
- [Compute Gallery best practices](https://learn.microsoft.com/en-us/azure/virtual-machines/azure-compute-gallery#best-practices)
- [Zone-redundant storage](https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy#zone-redundant-storage)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -77,15 +77,15 @@ You can also choose the account type for each of the target regions. The default
-### CG-3 - Consider using hyper-V generation version 2 images where possible
+### CG-3 - Consider creating TrustedLaunchSupported images where possible
-**Category: Availability**
+**Category: Access & Security**
**Impact: Low**
**Guidance**
-We recommend that you create a generation 2 virtual machine to take advantage of features like Secure Boot, vTPM, trusted launch VMs, large boot volume. Your choice to create a generation 1 or generation 2 virtual machine depends on which guest operating system you want to install and the boot method you want to use to deploy the virtual machine. You can't change a virtual machine's generation after you've created it. So it is recommended to review the [considerations](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v#which-guest-operating-systems-are-supported) first.
+We recommend that you create a Trusted Launch Supported Images to take advantage of features like Secure Boot, vTPM, trusted launch VMs, large boot volume. Trusted Launch Supported Images are Gen 2 Images by default. You can’t change a virtual machine’s generation after you’ve created it. So it is recommended to review the [considerations](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v#which-guest-operating-systems-are-supported) first.
**Resources**
@@ -93,7 +93,7 @@ We recommend that you create a generation 2 virtual machine to take advantage of
- [Generation 1 vs Generation 2 in Hyper-V](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v)
- [Images in Compute gallery](https://learn.microsoft.com/en-us/azure/virtual-machines/shared-image-galleries?tabs=azure-cli)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/compute/image-templates/_index.md b/docs/content/services/compute/image-templates/_index.md
index a3490a981..6e0b2700b 100644
--- a/docs/content/services/compute/image-templates/_index.md
+++ b/docs/content/services/compute/image-templates/_index.md
@@ -12,10 +12,10 @@ The presented resiliency recommendations in this guidance include Image Template
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [IT-1 - Use Generation 2 virtual machine source image](#it-1---use-generation-2-virtual-machine-source-image) | Low | Preview | No |
-| [IT-2 - Replicate your Image Templates to a secondary region](#it-2---replicate-your-image-templates-to-a-secondary-region) | Low | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:--------:|:-------------------:|
+| [IT-1 - Use Generation 2 virtual machine source image](#it-1---use-generation-2-virtual-machine-source-image) | Availability | Low | Verified | No |
+| [IT-2 - Replicate your Image Templates to a secondary region](#it-2---replicate-your-image-templates-to-a-secondary-region) | Disaster Recovery | Low | Verified | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -28,7 +28,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### IT-1 - Use Generation 2 virtual machine source image
-**Impact: Availability**
+**Category: Availability**
**Impact: Low**
@@ -44,6 +44,8 @@ When building your Image Templates, utilize source images that support generatio
### IT-2 - Replicate your Image Templates to a secondary region
+**Category: Disaster Recovery**
+
**Impact: Low**
**Guidance**
@@ -55,7 +57,7 @@ The Azure Image Builder service that is used to deploy Image Templates doesn't c
- [Image Template resiliency](https://learn.microsoft.com/en-us/azure/reliability/reliability-image-builder?toc=%2Fazure%2Fvirtual-machines%2Ftoc.json&bc=%2Fazure%2Fvirtual-machines%2Fbreadcrumb%2Ftoc.json#capacity-and-proactive-disaster-recovery-resiliency)
- [Azure Image Builder Supported Regions](https://learn.microsoft.com/en-us/azure/virtual-machines/image-builder-overview?tabs=azure-powershell#regions)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/compute/image-templates/code/it-2/it-2.kql b/docs/content/services/compute/image-templates/code/it-2/it-2.kql
index 614a7f9ca..85eb022b4 100644
--- a/docs/content/services/compute/image-templates/code/it-2/it-2.kql
+++ b/docs/content/services/compute/image-templates/code/it-2/it-2.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph Query
+// List all Image Templates that are not replicated to another region
+resources
+| where type =~ "microsoft.virtualmachineimages/imagetemplates"
+| mv-expand distribution=properties.distribute
+| where array_length(parse_json(distribution).replicationRegions) == 1
+| project recommendationId = "it-2", name, id, param1=strcat("replicationRegions:",parse_json(distribution).replicationRegions)
diff --git a/docs/content/services/compute/image-templates/code/it-2/it-2.kql.fix b/docs/content/services/compute/image-templates/code/it-2/it-2.kql.fix
deleted file mode 100644
index 9637ed28d..000000000
--- a/docs/content/services/compute/image-templates/code/it-2/it-2.kql.fix
+++ /dev/null
@@ -1,5 +0,0 @@
-// Azure Resource Graph Query
-// List all Image Templates with their associated regions to help determine if images are replicated
-resources
-| where type =~ "microsoft.virtualmachineimages/imagetemplates"
-| project recommendationId = "it-1", name, location, id
diff --git a/docs/content/services/compute/site-recovery/_index.md b/docs/content/services/compute/site-recovery/_index.md
index 0588fdb2b..2ade2a62c 100644
--- a/docs/content/services/compute/site-recovery/_index.md
+++ b/docs/content/services/compute/site-recovery/_index.md
@@ -12,10 +12,10 @@ The presented resiliency recommendations in this guidance include Azure Site Rec
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [ASR-1 - Ensure static IP addresses configured in VM failover settings are available in the failover subnet](#asr-1---ensure-static-ip-addresses-configured-in-vm-failover-settings-are-available-in-the-failover-subnet)| High | Preview | No |
-| [ASR-2 - Perform a test failover to validate the functionality and performance of the VMs in the target location](#asr-2---perform-a-test-failover-to-validate-the-functionality-and-performance-of-the-vms-in-the-target-location) | High | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [ASR-1 - Ensure static IP addresses configured in VM failover settings are available in the failover subnet](#asr-1---ensure-static-ip-addresses-configured-in-vm-failover-settings-are-available-in-the-failover-subnet) | Disaster Recovery | High | Preview | No |
+| [ASR-2 - Perform a test failover to validate the functionality and performance of the VMs in the target location](#asr-2---perform-a-test-failover-to-validate-the-functionality-and-performance-of-the-vms-in-the-target-location) | Disaster Recovery | High | Preview | Yes |
{{< /table >}}
@@ -29,7 +29,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### ASR-1 - Ensure static IP addresses configured in VM failover settings are available in the failover subnet
-**Category: Availability**
+**Category: Disaster Recovery**
**Impact: High**
@@ -41,7 +41,7 @@ Ensure static IP addresses configured in VM failover settings are available in t
- [Setup network mapping for site recovery](https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-network-mapping#set-up-ip-addressing-for-target-vms)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -53,7 +53,7 @@ Ensure static IP addresses configured in VM failover settings are available in t
### ASR-2 - Perform a test failover to validate the functionality and performance of the VMs in the target location
-**Category: Availability**
+**Category: Disaster Recovery**
**Impact: High**
@@ -66,7 +66,7 @@ Test your Disaster Recovery plan periodically without any data loss or downtime,
- [Run a test failover](https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-tutorial-dr-drill#run-a-test-failover)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/compute/site-recovery/code/asr-1/asr-1.kql b/docs/content/services/compute/site-recovery/code/asr-1/asr-1.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/compute/site-recovery/code/asr-1/asr-1.kql
+++ b/docs/content/services/compute/site-recovery/code/asr-1/asr-1.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/compute/virtual-machine-scale-sets/_index.md b/docs/content/services/compute/virtual-machine-scale-sets/_index.md
index 9db3bf46f..ff4b6dd55 100644
--- a/docs/content/services/compute/virtual-machine-scale-sets/_index.md
+++ b/docs/content/services/compute/virtual-machine-scale-sets/_index.md
@@ -12,17 +12,19 @@ The presented resiliency recommendations in this guidance include Virtual Machin
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [VMSS-1 - Deploy VMSS with Flex orchestration mode instead of Uniform](#vmss-1---deploy-vmss-with-flex-orchestration-mode-instead-of-uniform) | Medium | Preview | Yes |
-| [VMSS-2 - Enable VMSS application health monitoring](#vmss-2---enable-vmss-application-health-monitoring) | Medium | Preview | No |
-| [VMSS-3 - Enable Automatic Repair policy](#vmss-3---enable-automatic-repair-policy) | High | Preview | No |
-| [VMSS-4 - Configure VMSS autoscale to custom and configure the scaling metrics](#vmss-4---configure-vmss-autoscale-to-custom-and-configure-the-scaling-metrics) | High | Preview | Yes |
-| [VMSS-5 - Enable Predictive Autoscale and configure at least for Forecast Only](#vmss-5---enable-predictive-autoscale-and-configure-at-least-for-forecast-only) | Low | Preview | Yes |
-| [VMSS-6 - Disable Force strictly even balance across zones to avoid scale in and out fail attempts](#vmss-6---disable-force-strictly-even-balance-across-zones-to-avoid-scale-in-and-out-fail-attempts) | High | Preview | Yes |
-| [VMSS-7 - Configure Allocation Policy Spreading algorithm to Max Spreading](#vmss-7---configure-allocation-policy-spreading-algorithm-to-max-spreading) | Medium | Preview | Yes |
-| [VMSS-8 - Deploy VMSS across availability zones with VMSS Flex](#vmss-8---deploy-vmss-across-availability-zones-with-vmss-flex) | High | Preview | Yes |
-| [VMSS-9 - Set Patch orchestration options to Azure-orchestrated](#vmss-9---set-patch-orchestration-options-to-azure-orchestrated) | Low | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [VMSS-1 - Deploy VMSS with Flex orchestration mode instead of Uniform](#vmss-1---deploy-vmss-with-flex-orchestration-mode-instead-of-uniform) | System Efficiency | Medium | Verified | Yes |
+| [VMSS-2 - Enable VMSS application health monitoring](#vmss-2---enable-vmss-application-health-monitoring) | Monitoring | Medium | Verified | Yes |
+| [VMSS-3 - Enable Automatic Repair policy](#vmss-3---enable-automatic-repair-policy) | Automation | High | Verified | Yes |
+| [VMSS-4 - Configure VMSS autoscale to custom and configure the scaling metrics](#vmss-4---configure-vmss-autoscale-to-custom-and-configure-the-scaling-metrics) | System Efficiency | High | Verified | Yes |
+| [VMSS-5 - Enable Predictive Autoscale and configure at least for Forecast Only](#vmss-5---enable-predictive-autoscale-and-configure-at-least-for-forecast-only) | System Efficiency | Low | Verified | Yes |
+| [VMSS-6 - Disable Force strictly even balance across zones to avoid scale in and out fail attempts](#vmss-6---disable-force-strictly-even-balance-across-zones-to-avoid-scale-in-and-out-fail-attempts) | Availability | High | Verified | Yes |
+| [VMSS-7 - Configure Allocation Policy Spreading algorithm to Max Spreading](#vmss-7---configure-allocation-policy-spreading-algorithm-to-max-spreading) | System Efficiency | Medium | Preview | Yes |
+| [VMSS-8 - Deploy VMSS across availability zones with VMSS Flex](#vmss-8---deploy-vmss-across-availability-zones-with-vmss-flex) | Availability | High | Verified | Yes |
+| [VMSS-9 - Set Patch orchestration options to Azure-orchestrated](#vmss-9---set-patch-orchestration-options-to-azure-orchestrated) | Automation | Low | Verified | Yes |
+| [VMSS-10 - Upgrade VMSS Image versions scheduled to be deprecated or already retired](#vmss-10---upgrade-vmss-image-versions-scheduled-to-be-deprecated-or-already-retired) | Governance | High | Preview | No |
+| [VMSS-11 - Production VMSS instances should be using SSD disks](#vmss-11---production-vmss-instances-should-be-using-ssd-disks) | System Efficiency | High | Verified | Yes |
{{< /table >}}
@@ -49,7 +51,7 @@ Even single instance VMs should be deployed into a scale set using the Flexible
- [When to use VMSS instead of VMs](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-design-overview#when-to-use-scale-sets-instead-of-virtual-machines)
- [Azure Well-Architected Framework review - Virtual Machines and Scale Sets](https://learn.microsoft.com/azure/well-architected/services/compute/virtual-machines/virtual-machines-review)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -73,7 +75,7 @@ Monitoring your application health is an important signal for managing and upgra
- [Using Application Health extension with Virtual Machine Scale Sets](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-health-extension?tabs=rest-api)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -99,7 +101,7 @@ Grace period is specified in minutes in ISO 8601 format and can be set using the
- [Automatic instance repairs for Azure Virtual Machine Scale Sets](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-instance-repairs#requirements-for-using-automatic-instance-repairs)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -126,7 +128,7 @@ Autoscale is a built-in feature that helps applications perform their best when
- [Get started with autoscale in Azure](https://learn.microsoft.com/azure/azure-monitor/autoscale/autoscale-get-started?WT.mc_id=Portal-Microsoft_Azure_Monitoring)
- [Overview of autoscale in Azure](https://learn.microsoft.com/azure/azure-monitor/autoscale/autoscale-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -150,7 +152,7 @@ Predictive autoscale uses machine learning to help manage and scale Azure Virtua
- [Use predictive autoscale to scale out before load demands in virtual machine scale sets](https://learn.microsoft.com/azure/azure-monitor/autoscale/autoscale-predictive)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -180,7 +182,7 @@ While Azure VMSS provides the option to enforce even distribution of VM instance
- [Use scale-in policies with Azure Virtual Machine Scale Sets](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-scale-in-policy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -204,7 +206,7 @@ With max spreading, the scale set spreads your VMs across as many fault domains
- [Availability Considerations](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-use-availability-zones#availability-considerations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -229,7 +231,7 @@ When you create your VMSS, use availability zones to protect your applications a
- [Create a Virtual Machine Scale Set that uses Availability Zones](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-use-availability-zones)
- [Update scale set to add availability zones](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-use-availability-zones?tabs=cli-1%2Cportal-2#update-scale-set-to-add-availability-zones)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -252,8 +254,9 @@ Enabling automatic VM guest patching for your Azure VMs helps ease update manage
**Resources**
- [Automatic VM Guest Patching for Azure VMs](https://learn.microsoft.com/azure/virtual-machines/automatic-vm-guest-patching)
+- [Auto OS Image Upgrades](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-upgrade)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -263,3 +266,50 @@ Enabling automatic VM guest patching for your Azure VMs helps ease update manage
+### VMSS-10 - Upgrade VMSS Image versions scheduled to be deprecated or already retired
+
+**Category: Governance**
+
+**Impact: High**
+
+**Guidance**
+
+Ensure current versions of images are in use to avoid disruption after image deprecation. This ensures that if these images are deprecated that you will not be impacted as you will no longer be able to deploy any additional VMs or VMSS once the image has been deprecated.
+
+**Resources**
+
+- [Deprecated Azure Marketplace images](https://learn.microsoft.com/en-us/azure/virtual-machines/deprecated-images)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/vmss-10/vmss-10.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### VMSS-11 - Production VMSS instances should be using SSD disks
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Guidance**
+
+It is advised that you use SSD disks for Production workloads. Using HDD could impact your resources as it should only be used for non-critical resources and for resources that require infrequent access.
+
+**Resources**
+
+- [Disk Comparison](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types#disk-type-comparison)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/vmss-11/vmss-11.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-10/vmss-10.kql b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-10/vmss-10.kql
index ae95b17bd..9b25f11d6 100644
--- a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-10/vmss-10.kql
+++ b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-10/vmss-10.kql
@@ -1,7 +1 @@
-// Azure Resource Graph Query
-// This query will check if the VMSS are currently using the latest image. If not the Image reference will be empty
-resources
-| where type == "microsoft.compute/virtualmachinescalesets"
-| extend VMSSName = name
-| extend ImageReference = tostring(properties.virtualMachineProfile.storageProfile.imageReference.version)
-| project recommendationId="vmss-10",name,id, param1="ImageReference"
+//cannot be validated with arg
diff --git a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-11/vmss-11.kql b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-11/vmss-11.kql
new file mode 100644
index 000000000..409dc9cbd
--- /dev/null
+++ b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-11/vmss-11.kql
@@ -0,0 +1,7 @@
+// Azure Resource Graph Query
+// Find all VMSSs Uniform not using SSD storage
+resources
+| where type == "microsoft.compute/virtualmachinescalesets"
+| where properties.orchestrationMode != "Flexible"
+| where properties.virtualMachineProfile.storageProfile.osDisk.managedDisk.storageAccountType == 'Standard_LRS'
+| project recommendationId = "vmss-11", name, id, tags
diff --git a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-2/vmss-2.kql b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-2/vmss-2.kql
index 134c10b35..369512486 100644
--- a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-2/vmss-2.kql
+++ b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-2/vmss-2.kql
@@ -10,4 +10,4 @@ resources
| project id
) on id
| where id1 == ""
-| project recommendationId = "vmss-2", name, id, param1 = "extension: null"
+| project recommendationId = "vmss-2", name, id, tags, param1 = "extension: null"
diff --git a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-3/vmss-3.kql b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-3/vmss-3.kql
index ca12ae9d5..3a88afcfe 100644
--- a/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-3/vmss-3.kql
+++ b/docs/content/services/compute/virtual-machine-scale-sets/code/vmss-3/vmss-3.kql
@@ -1,6 +1,6 @@
// Azure Resource Graph Query
// Find all VMs that do NOT have automatic repair policy enabled
resources
-| where type == "microsoft.compute/virtualmachinescalesets"
-| where properties.automaticRepairsPolicy.enabled == false
-| project recommendationId = "vmss-3", name, id, param1 = "automaticRepairsPolicy: Disabled"
+| where type == "microsoft.compute/virtualmachinescalesets"
+| where properties.automaticRepairsPolicy.enabled == false
+| project recommendationId = "vmss-3", name, id, tags, param1 = "automaticRepairsPolicy: Disabled"
diff --git a/docs/content/services/compute/virtual-machines/_index.md b/docs/content/services/compute/virtual-machines/_index.md
index 9f119f152..5b23cbeb4 100644
--- a/docs/content/services/compute/virtual-machines/_index.md
+++ b/docs/content/services/compute/virtual-machines/_index.md
@@ -12,30 +12,35 @@ The presented resiliency recommendations in this guidance include Virtual Machin
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [VM-1 - Run production workloads on two or more VMs using VMSS Flex](#vm-1---run-production-workloads-on-two-or-more-vms-using-vmss-flex) | High | Verified | No |
-| [VM-2 - Deploy VMs across Availability Zones](#vm-2---deploy-vms-across-availability-zones) | High | Verified | Yes |
-| [VM-3 - Migrate VMs using availability sets to VMSS Flex](#vm-3---migrate-vms-using-availability-sets-to-vmss-flex) | High | Verified | No |
-| [VM-4 - Replicate VMs using Azure Site Recovery](#vm-4---replicate-vms-using-azure-site-recovery) | Medium | Verified | Yes |
-| [VM-5 - Use Managed Disks for Virtual Machine disks](#vm-5---use-managed-disks-for-vm-disks) | High | Verified | Yes |
-| [VM-6 - Host application or database data on a data disk](#vm-6---host-application-or-database-data-on-a-data-disk) | Low | Verified | Yes |
-| [VM-7 - Enable Backups on your VMs](#vm-7---backup-vms-with-azure-backup-service) | Medium | Verified | Yes |
-| [VM-8 - Production VMs should be using SSD disks](#vm-8---production-vms-should-be-using-ssd-disks) | High | Verified | Yes |
-| [VM-9 - There are VMs in Stopped state](#vm-9---review-vms-in-stopped-state) | Low | Verified | Yes |
-| [VM-10 - Accelerated Networking is not enabled](#vm-10---enable-accelerated-networking-accelnet) | Medium | Verified | Yes |
-| [VM-11 - Accelerated Networking is enabled, make sure you update the GuestOS NIC driver every 6 months](#vm-11---when-accelnet-is-enabled-you-must-manually-update-the-guestos-nic-driver) | Low | Verified | Yes |
-| [VM-12 - VMs should not have a Public IP directly associated](#vm-12---vms-should-not-have-a-public-ip-directly-associated) | Medium | Verified | Yes |
-| [VM-13 - VM network interfaces and associated subnets both have a Network Security Group (NSG) associated](#vm-13---vm-network-interfaces-and-associated-subnets-both-have-a-network-security-group-nsg-associated) | Low | Verified | No |
-| [VM-14 - IP Forwarding should only be enabled for Network Virtual Appliances](#vm-14---ip-forwarding-should-only-be-enabled-for-network-virtual-appliances) | Medium | Verified | Yes |
-| [VM-15 - Customer DNS Servers should be configured in the Virtual Network level](#vm-15---dns-servers-should-be-configured-in-the-virtual-network-level) | Low | Verified | Yes |
-| [VM-16 - Shared disks should only be enabled in Clustered servers](#vm-16---shared-disks-should-only-be-enabled-in-clustered-servers) | Medium | Verified | Yes |
-| [VM-17 - The Network access to the VM disk is set to "Enable Public access from all networks"](#vm-17---network-access-to-the-vm-disk-should-be-set-to-disable-public-access-and-enable-private-access) | Low | Verified | Yes |
-| [VM-18 - Virtual Machine is not compliant with Azure Policies](#vm-18---ensure-that-your-vms-are-compliant-with-azure-policies) | Low | Verified | Yes |
-| [VM-19 - Enable disk encryption, Enable data at rest encryption by default](#vm-19---enable-disk-encryption-and-data-at-rest-encryption-by-default) | Medium | Verified | Yes |
-| [VM-20 - Enable Insights to get more visibility into the health and performance of your virtual machine](#vm-20---enable-vm-insights) | Low | Verified | Yes |
-| [VM-21 - Configure diagnostic settings for all Azure Virtual Machines](#vm-21---configure-diagnostic-settings-for-all-azure-virtual-machines) | Low | Preview | Yes |
-| [VM-22 - Use maintenance configurations for the Virtual Machine](#vm-22---use-maintenance-configurations-for-the-vms) | High | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------:|:------:|:--------:|:-------------------:|
+| [VM-1 - Run production workloads on two or more VMs using VMSS Flex](#vm-1---run-production-workloads-on-two-or-more-vms-using-vmss-flex) | Availability | High | Verified | Yes |
+| [VM-2 - Deploy VMs across Availability Zones](#vm-2---deploy-vms-across-availability-zones) | Availability | High | Verified | Yes |
+| [VM-3 - Migrate VMs using availability sets to VMSS Flex](#vm-3---migrate-vms-using-availability-sets-to-vmss-flex) | Availability | High | Verified | Yes |
+| [VM-4 - Replicate VMs using Azure Site Recovery](#vm-4---replicate-vms-using-azure-site-recovery) | Disaster Recovery | Medium | Verified | Yes |
+| [VM-5 - Use Managed Disks for Virtual Machine disks](#vm-5---use-managed-disks-for-vm-disks) | Availability | High | Verified | Yes |
+| [VM-6 - Host database data on a data disk](#vm-6---host-database-data-on-a-data-disk) | System Efficiency | Low | Verified | Yes |
+| [VM-7 - Enable Backups on your VMs](#vm-7---backup-vms-with-azure-backup-service) | Disaster Recovery | Medium | Verified | Yes |
+| [VM-8 - Production VMs should be using SSD disks](#vm-8---production-vms-should-be-using-ssd-disks) | System Efficiency | High | Verified | Yes |
+| [VM-9 - There are VMs in Stopped state](#vm-9---review-vms-in-stopped-state) | Governance | Low | Verified | Yes |
+| [VM-10 - Accelerated Networking is not enabled](#vm-10---enable-accelerated-networking-accelnet) | System Efficiency | Medium | Verified | Yes |
+| [VM-11 - Accelerated Networking is enabled, make sure you update the GuestOS NIC driver every 6 months](#vm-11---when-accelnet-is-enabled-you-must-manually-update-the-guestos-nic-driver) | Governance | Low | Verified | No |
+| [VM-12 - VMs should not have a Public IP directly associated](#vm-12---vms-should-not-have-a-public-ip-directly-associated) | Access & Security | Medium | Verified | Yes |
+| [VM-13 - VM network interfaces and associated subnets both have a Network Security Group (NSG) associated](#vm-13---vm-network-interfaces-and-associated-subnets-both-have-a-network-security-group-nsg-associated) | Access & Security | Low | Verified | Yes |
+| [VM-14 - IP Forwarding should only be enabled for Network Virtual Appliances](#vm-14---ip-forwarding-should-only-be-enabled-for-network-virtual-appliances) | Access & Security | Medium | Verified | Yes |
+| [VM-15 - Customer DNS Servers should be configured in the Virtual Network level](#vm-15---customer-dns-servers-should-be-configured-in-the-virtual-network-level) | Networking | Low | Verified | Yes |
+| [VM-16 - Shared disks should only be enabled in Clustered servers](#vm-16---shared-disks-should-only-be-enabled-in-clustered-servers) | Storage | Medium | Verified | Yes |
+| [VM-17 - The Network access to the VM disk is set to Enable Public access from all networks](#vm-17---network-access-to-the-vm-disk-should-be-set-to-disable-public-access-and-enable-private-access) | Access & Security | Low | Verified | Yes |
+| [VM-18 - Virtual Machine is not compliant with Azure Policies](#vm-18---ensure-that-your-vms-are-compliant-with-azure-policies) | Governance | Low | Verified | Yes |
+| [VM-19 - Enable advanced encryption options for your managed disks](#vm-19---enable-advanced-encryption-options-for-your-managed-disks) | Access & Security | Medium | Verified | No |
+| [VM-20 - Enable Insights to get more visibility into the health and performance of your virtual machine](#vm-20---enable-vm-insights) | Monitoring | Low | Verified | Yes |
+| [VM-21 - Configure diagnostic settings for all Azure Virtual Machines](#vm-21---configure-diagnostic-settings-for-all-azure-virtual-machines) | Monitoring | Low | Preview | Yes |
+| [VM-22 - Use maintenance configurations for the Virtual Machine](#vm-22---use-maintenance-configurations-for-the-vms) | Governance | High | Verified | Yes |
+| [VM-23 - Avoid using A or B-Series VM Sku for production VMs that need the full performance of the CPU continuously](#vm-23---avoid-using-a-or-b-series-vm-sku-for-production-vms-that-need-the-full-performance-of-the-cpu-continuously) | System Efficiency | High | Preview | Yes |
+| [VM-24 - Mission Critical Workloads should be using Premium or Ultra Disks](#vm-24---mission-critical-workloads-should-be-using-premium-or-ultra-disks) | System Efficiency | High | Preview | Yes |
+| [VM-27 - Use Azure Boost VMs for Maintenance sensitive workload](#vm-27---use-azure-boost-vms-for-maintenance-sensitive-workload) | Availability | Medium | Preview | No |
+| [VM-28 - Enable Scheduled Events for Maintenance sensitive workload VMs](#vm-28---enable-scheduled-events-for-maintenance-sensitive-workload-vms) | Availability | Medium | Preview | No |
+
{{< /table >}}
{{< alert style="info" >}}
@@ -54,16 +59,14 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
**Guidance**
-To safeguard application workloads from downtime due to the temporary unavailability of a disk or VM, it's recommended that you run production workloads on two or more VMs using VMSS Flex. To achieve this you can use:
-
-- Azure Virtual Machine Scale Sets to create and manage a group of load balanced VMs. The number of VM instances can automatically increase or decrease in response to demand or a defined schedule.
-- Availability zones.
+Production VM workloads should be deployed on multiple VMs and grouped together in a VMSS Flex instance. VMSS Flex intelligently distributes VMs across the platform to minimize the impact of platform faults and platform updates on a workload. A workload running on single instance VMs, even when those instances are spread across availability zones, cannot receive the same protection because the platform has no way of knowing the VMs are related to each other.
**Resources**
-- [Resiliency checklist for Virtual Machines](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#virtual-machines)
+- [What has changed with Flexible orchestration mode](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#what-has-changed-with-flexible-orchestration-mode)
+- [Attach or detach a Virtual Machine to or from a Virtual Machine Scale Set](https://learn.microsoft.com/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-attach-detach-vm?branch=main&tabs=portal-1%2Cportal-2%2Cportal-3)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -87,7 +90,7 @@ Azure Availability Zones are physically separate locations within each Azure reg
- [Create virtual machines in an availability zone using the Azure portal](https://learn.microsoft.com/azure/virtual-machines/create-portal-availability-zone?tabs=standard)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -116,7 +119,7 @@ In an N-tier application, it's recommended that you place each application tier
- [Resiliency checklist for Virtual Machines](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#virtual-machines)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -141,7 +144,7 @@ When you replicate Azure VMs using Site Recovery, all the VM disks are continuou
- [Resiliency checklist for Virtual Machines](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#virtual-machines)
- [Run a test failover (disaster recovery drill) to Azure](https://learn.microsoft.com/azure/site-recovery/site-recovery-test-failover-to-azure)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -159,14 +162,15 @@ When you replicate Azure VMs using Site Recovery, all the VM disks are continuou
**Guidance**
-Managed disks provide better reliability for VMs in an availability set, because the disks are sufficiently isolated from each other to avoid single points of failure. Also, managed disks aren't subject to the IOPS limits of VHDs created in a storage account.
+Azure unmanaged disks will be fully retired on September 30, 2025. If you use unmanaged disks, start planning the migration now.
**Resources**
-- [Resiliency checklist for Virtual Machines](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#virtual-machines)
-- [Availability options for Azure Virtual Machines](https://learn.microsoft.com/azure/virtual-machines/windows/manage-availability#use-managed-disks-for-vms-in-an-availability-set)
+- [Migrate your Azure unmanaged disks by Sep 30, 2025](https://learn.microsoft.com/azure/virtual-machines/unmanaged-disks-deprecation)
+- [Migrate Windows VM from unmanaged disks to managed disks](https://learn.microsoft.com/azure/virtual-machines/windows/convert-unmanaged-to-managed-disks)
+- [Migrate Linux VM from unmanaged disks to managed disks](https://learn.microsoft.com/azure/virtual-machines/linux/convert-unmanaged-to-managed-disks)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -176,7 +180,7 @@ Managed disks provide better reliability for VMs in an availability set, because
-### VM-6 - Host application or database data on a data disk
+### VM-6 - Host database data on a data disk
**Category: System Efficiency**
@@ -184,13 +188,15 @@ Managed disks provide better reliability for VMs in an availability set, because
**Guidance**
-A data disk is a managed disk that's attached to a virtual machine to store application data, or other data you need to keep. Data disks are registered as SCSI drives and are labeled with a letter that you choose. Hosting you data on a data disk also helps with flexibility when backuping or restoring data, as well as migrating the disk without having to migrate the entire Virtual Machine and Operating System. You will be able to also select a different disk sku, with different type, size, and performance that meet your requirements.
+Host database data on a data disk instead of OS disk.
+A data disk is a managed disk that is attached to a virtual machine to store data you need to keep. Data disks are registered as SCSI drives and are labeled with a letter that you choose. Hosting your data on a data disk helps with flexibility when backuping or restoring data, as well as migrating the disk without having to migrate the entire Virtual Machine and Operating System. You will be able to select a different disk sku, with different type, size, and performance that meet your requirements.
**Resources**
- [Introduction to Azure managed disks - Data disks](https://learn.microsoft.com/azure/virtual-machines/managed-disks-overview#data-disk)
+- [Azure managed disk types](https://learn.microsoft.com/azure/virtual-machines/disks-types)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -214,7 +220,7 @@ Enable backups for your virtual machines to secure and quickly recover your data
- [What is the Azure Backup service?](https://learn.microsoft.com/azure/backup/backup-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -244,11 +250,12 @@ If you want to upgrade from Standard HDD to Premium SSD disks, consider the foll
- Upgrading requires a VM reboot and this process takes 3-5 minutes to complete.
- If VMs are mission-critical production VMs, evaluate the improved availability against the cost of premium disks.
+This does not apply to ephemeral disks
**Resources**
- [Azure managed disk types](https://learn.microsoft.com/azure/virtual-machines/disks-types#premium-ssd)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -272,7 +279,7 @@ Azure Virtual Machines (VM) instances go through different states. There are pro
- [States and billing status of Azure Virtual Machines](https://learn.microsoft.com/azure/virtual-machines/states-billing?context=%2Ftroubleshoot%2Fazure%2Fvirtual-machines%2Fcontext%2Fcontext#power-states-and-billing)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -298,7 +305,7 @@ This configuration is not always required, evaluate this option according to the
- [Accelerated Networking (AccelNet) overview](https://learn.microsoft.com/azure/virtual-network/accelerated-networking-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -322,7 +329,7 @@ When Accelerated Networking is enabled the default Azure Virtual Network interfa
- [Accelerated Networking (AccelNet) overview](https://learn.microsoft.com/azure/virtual-network/accelerated-networking-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -346,7 +353,7 @@ If a Virtual Machine requires outbound internet connectivity we recommend the us
- [Use Source Network Address Translation (SNAT) for outbound connections](https://learn.microsoft.com/azure/load-balancer/load-balancer-outbound-connections)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -370,7 +377,7 @@ Unless you have a specific reason to, we recommend that you associate a network
- [How network security groups filter network traffic](https://learn.microsoft.com/azure/virtual-network/network-security-group-how-it-works#intra-subnet-traffic)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -400,7 +407,7 @@ The setting must be enabled for every network interface that is attached to the
- [Enable or disable IP forwarding](https://learn.microsoft.com/azure/virtual-network/virtual-network-network-interface?tabs=network-interface-portal#enable-or-disable-ip-forwarding)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -410,7 +417,7 @@ The setting must be enabled for every network interface that is attached to the
-### VM-15 - DNS Servers should be configured in the Virtual Network level
+### VM-15 - Customer DNS Servers should be configured in the Virtual Network level
**Category: Storage**
@@ -424,7 +431,7 @@ Configure the DNS Server in the Virtual Network to avoid inconsistency across th
- [Name resolution for resources in Azure virtual networks](https://learn.microsoft.com/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -446,9 +453,10 @@ Azure shared disks is a feature for Azure managed disks that enables you to atta
**Resources**
-- [Azure Shared Disks](https://learn.microsoft.com/azure/virtual-machines/disks-shared-enable?tabs=azure-portal)
+- [Azure Shared Disk Introduction](https://learn.microsoft.com/azure/virtual-machines/disks-shared)
+- [Enable Shared Disks](https://learn.microsoft.com/azure/virtual-machines/disks-shared-enable?tabs=azure-portal)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -458,7 +466,7 @@ Azure shared disks is a feature for Azure managed disks that enables you to atta
-### VM-17 - Network access to the VM disk should be set to "Disable public access and enable private access"
+### VM-17 - Network access to the VM disk should be set to Disable public access and enable private access
**Category: Access & Security**
@@ -466,13 +474,13 @@ Azure shared disks is a feature for Azure managed disks that enables you to atta
**Guidance**
-Recommended changing to "Disable public access and enable private access" and creating a Private Endpoint
+Recommended changing to "Disable public access and enable private access" and creating a Private Endpoint.
**Resources**
- [Restrict import/export access for managed disks using Azure Private Link](https://learn.microsoft.com/azure/virtual-machines/disks-enable-private-links-for-import-export-portal)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -497,7 +505,7 @@ It's important to keep your virtual machine (VM) secure for the applications tha
- [Policy-driven governance](https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-principles#policy-driven-governance)
- [Azure Policy Regulatory Compliance controls for Azure Virtual Machines](https://learn.microsoft.com/azure/virtual-machines/security-policy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -507,7 +515,7 @@ It's important to keep your virtual machine (VM) secure for the applications tha
-### VM-19 - Enable disk encryption and data at rest encryption by default
+### VM-19 - Enable advanced encryption options for your managed disks
**Category: Access & Security**
@@ -515,18 +523,18 @@ It's important to keep your virtual machine (VM) secure for the applications tha
**Guidance**
-There are several types of encryption available for your managed disks, including Azure Disk Encryption (ADE), Server-Side Encryption (SSE) and encryption at host.
+Azure Disk Storage Server-Side Encryption (also referred to as encryption-at-rest or Azure Storage encryption) automatically encrypts data stored on Azure managed disks (OS and data disks) when persisting on the Storage Clusters. There are several types of advanced encryption options available for your managed disks, including Azure Disk Encryption (ADE), Encryption at host and Confidential disk encryption.
-- Azure Disk Encryption helps protect and safeguard your data to meet your organizational security and compliance commitments.
-- Azure Disk Storage Server-Side Encryption (also referred to as encryption-at-rest or Azure Storage encryption) automatically encrypts data stored on Azure managed disks (OS and data disks) when persisting on the Storage Clusters.
+- ADE encrypts the disks of Azure virtual machines (VMs) inside your VMs by using the DM-Crypt feature of Linux or the BitLocker feature of Windows.
- Encryption at host ensures that data stored on the VM host hosting your VM is encrypted at rest and flows encrypted to the Storage clusters.
-- Confidential disk encryption binds disk encryption keys to the virtual machine's TPM and makes the protected disk content accessible only to the VM.
+- Confidential disk encryption binds disk encryption keys to the virtual machine’s TPM and makes the protected disk content accessible only to the VM.
+
**Resources**
- [Overview of managed disk encryption options](https://learn.microsoft.com/azure/virtual-machines/disk-encryption-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -538,7 +546,7 @@ There are several types of encryption available for your managed disks, includin
### VM-20 - Enable VM Insights
-**Category: Monitoring**
+**Category: Monitoring**
**Impact: Low**
@@ -551,7 +559,7 @@ VM insights monitors the performance and health of your virtual machines and vir
- [Overview of VM insights](https://learn.microsoft.com/azure/azure-monitor/vm/vminsights-overview)
- [Did the extension install properly?](https://learn.microsoft.com/azure/azure-monitor/vm/vminsights-troubleshoot#did-the-extension-install-properly)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -586,7 +594,7 @@ A single diagnostic setting can define no more than one of each of the destinati
- [Diagnostic settings in Azure Monitor](https://learn.microsoft.com/azure/azure-monitor/essentials/diagnostic-settings?tabs=portal)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -610,7 +618,7 @@ The maintenance configuration settings allows user to schedule and manage update
- [Use maintenance configurations to control and manage the VM updates](https://learn.microsoft.com/azure/virtual-machines/maintenance-configurations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -619,3 +627,108 @@ The maintenance configuration settings allows user to schedule and manage update
{{< /collapse >}}
+
+### VM-23 - Avoid using A or B-Series VM Sku for production VMs that need the full performance of the CPU continuously
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Guidance**
+
+A-series VMs have CPU performance and memory configurations best suited for entry level workloads like development and test. Some example use cases include development and test servers, low traffic web servers, small to medium databases, proof-of-concepts, and code repositories.
+
+B-series VMs are ideal for workloads that do not need the full performance of the CPU continuously, like web servers, proof of concepts, small databases and development build environments. These workloads typically have burstable performance requirements. To determine the physical hardware on which this size is deployed, query the virtual hardware from within the virtual machine. The B-series provides you with the ability to purchase a VM size with baseline performance that can build up credits when it is using less than its baseline. When the VM has accumulated credits, the VM can burst above the baseline using up to 100% of the vCPU when your application requires higher CPU performance. Upon consuming all the CPU credits, a B-series virtual machine is throttled back to its base CPU performance until it accumulates the credits to CPU burst again.
+
+**Resources**
+
+- [B-series burstable virtual machine sizes](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-b-series-burstable)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/vm-23/vm-23.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### VM-24 - Mission Critical Workloads should be using Premium or Ultra Disks
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Guidance**
+
+Azure Premium SSDs deliver high-performance and low-latency disk support for virtual machines (VMs) with input/output (IO)-intensive workloads.
+
+Premium SSD v2 offers higher performance than Premium SSDs while also generally being less costly. You can individually tweak the performance (capacity, throughput, and IOPS) of Premium SSD v2 disks at any time, allowing workloads to be cost efficient while meeting shifting performance needs. You should use Premium solid-state drives (SSDs) as operating system (OS) disks as the V2 is not supported as OS Disk.
+
+Azure ultra disks are the highest-performing storage option for Azure virtual machines (VMs). You can change the performance parameters of an ultra disk without having to restart your VMs. Ultra disks are suited for data-intensive workloads such as SAP HANA, top-tier databases, and transaction-heavy workloads. Ultra disks must be used as data disks and can only be created as empty disks. You should use Premium solid-state drives (SSDs) as operating system (OS) disks.
+
+**Resources**
+
+- [Disk type comparison and decision tree](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types#disk-type-comparison)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/vm-24/vm-24.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### VM-27 - Use Azure Boost VMs for Maintenance sensitive workload
+
+**Category: Availability**
+
+**Impact: Medium**
+
+**Guidance**
+
+If the workload is Maintenance sensitive, please consider using Azure Boost compatible VMs. Azure Boost is designed to lessen the impact on customers when Azure maintenance activities occur.
+
+**Resources**
+
+- [Microsoft Azure Boost](https://learn.microsoft.com/azure/azure-boost/overview)
+- [Announcing the general availability of Azure Boost](https://aka.ms/AzureBoostGABlog)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/vm-27/vm-27.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### VM-28 - Enable Scheduled Events for Maintenance sensitive workload VMs
+
+**Category: Availability**
+
+**Impact: Medium**
+
+**Guidance**
+
+If the workload is Maintenance sensitive, please enable Scheduled Events. Scheduled Events is an Azure Metadata Service that gives your application time to prepare for virtual machine maintenance. It provides information about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. It's available for all Azure Virtual Machines types, including PaaS and IaaS on both Windows and Linux.
+
+**Resources**
+
+- [Monitor scheduled events for your Azure VMs](https://learn.microsoft.com/azure/virtual-machines/windows/scheduled-event-service)
+- [Azure Metadata Service: Scheduled Events for Linux VMs](https://learn.microsoft.com/azure/virtual-machines/linux/scheduled-events)
+- [Azure Metadata Service: Scheduled Events for Windows VMs](https://learn.microsoft.com/azure/virtual-machines/windows/scheduled-events)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/vm-28/vm-28.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/compute/virtual-machines/code/vm-1/vm-1.kql b/docs/content/services/compute/virtual-machines/code/vm-1/vm-1.kql
index 614a7f9ca..57e4416a2 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-1/vm-1.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-1/vm-1.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Find all VMs that are not associated with a VMSS Flex instance
+resources
+| where type =~ 'Microsoft.Compute/virtualMachines'
+| where isnull(properties.virtualMachineScaleSet.id)
+| project recommendationId="vm-1", name, id, tags
diff --git a/docs/content/services/compute/virtual-machines/code/vm-11/vm-11.kql b/docs/content/services/compute/virtual-machines/code/vm-11/vm-11.kql
index 7cb636c12..fa5cad258 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-11/vm-11.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-11/vm-11.kql
@@ -1,14 +1 @@
-// Azure Resource Graph Query
-// Find all VMs with Accelerated Networking Enabled - GuestOS admins should validate the drivers for these VMs. This is not an indication of an issue as the query does not have access to the GuestOS.
-Resources
-| where type =~ 'Microsoft.Compute/virtualMachines'
-| mv-expand nic=properties.networkProfile.networkInterfaces
-| project name, id, nicName = tostring(split(tostring(nic.id), '/')[8]), tags
-| join kind=inner (
- Resources
- | where type =~ 'Microsoft.Network/networkInterfaces'
- | where properties.enableAcceleratedNetworking == true
- | project nicName = tostring(split(tostring(id), '/')[8])
-) on nicName
-| project recommendationId = "vm-11", name, id, tags
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/compute/virtual-machines/code/vm-13/vm-13.kql b/docs/content/services/compute/virtual-machines/code/vm-13/vm-13.kql
index 1dcf5d714..3c3b779ee 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-13/vm-13.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-13/vm-13.kql
@@ -19,7 +19,7 @@ Resources
| where type =~ 'Microsoft.Compute/virtualMachines'
| where isnotnull(properties.networkProfile.networkInterfaces)
| mv-expand nic=properties.networkProfile.networkInterfaces
- | project vmName = name, vmId = id, nicId = nic.id, nicName=split(nic.id, '/')[8]
+ | project vmName = name, vmId = id, tags, nicId = nic.id, nicName=split(nic.id, '/')[8]
| extend nicId = tostring(nicId)
) on nicId
-| project recommendationId = "vm-13", name=vmName, id = vmId, param1 = strcat("nic-name=", nicName)
+| project recommendationId = "vm-13", name=vmName, id = vmId, tags, param1 = strcat("nic-name=", nicName)
diff --git a/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.fix b/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.fix
new file mode 100644
index 000000000..0c158ded3
--- /dev/null
+++ b/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.fix
@@ -0,0 +1,8 @@
+// Azure Resource Graph Query
+// Find all disks that are not encrypted
+resources
+| where type == "microsoft.compute/disks"
+| extend encryptionType = properties.encryption.type
+| extend diskState = properties.diskState
+| where encryptionType !in ("EncryptionAtRestWithCustomerKey", "EncryptionAtRestWithPlatformAndCustomerKeys", "EncryptionAtRestWithPlatformKey")
+| project recommendationId="vm-19", name, id, tags, param1=strcat("encryptionType: " , properties.encryption.type), param2= strcat ("diskstate: ", properties.diskState)
diff --git a/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.kql b/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.kql
index 0c158ded3..614a7f9ca 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-19/vm-19.kql
@@ -1,8 +1 @@
-// Azure Resource Graph Query
-// Find all disks that are not encrypted
-resources
-| where type == "microsoft.compute/disks"
-| extend encryptionType = properties.encryption.type
-| extend diskState = properties.diskState
-| where encryptionType !in ("EncryptionAtRestWithCustomerKey", "EncryptionAtRestWithPlatformAndCustomerKeys", "EncryptionAtRestWithPlatformKey")
-| project recommendationId="vm-19", name, id, tags, param1=strcat("encryptionType: " , properties.encryption.type), param2= strcat ("diskstate: ", properties.diskState)
+// under-development
diff --git a/docs/content/services/compute/virtual-machines/code/vm-20/vm-20.kql b/docs/content/services/compute/virtual-machines/code/vm-20/vm-20.kql
index 7f701116b..ee8b56e2f 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-20/vm-20.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-20/vm-20.kql
@@ -1,18 +1,26 @@
// Azure Resource Graph Query
-// Find all VMs that do not have the VM Insights extension installed
-resources
+// Check for VMs without Azure Monitoring Agent extension installed, missing Data Collection Rule or Data Collection Rule without performance enabled.
+Resources
| where type == 'microsoft.compute/virtualmachines'
-| extend
- JoinID = toupper(id),
- vmName = name,
- OSType = tostring(properties.storageProfile.osDisk.osType)
-| join kind=leftouter(
- resources
- | where type == 'microsoft.compute/virtualmachines/extensions'
+| project idVm = tolower(id), name, tags
+| join kind=leftouter (
+ InsightsResources
+ | where type =~ "Microsoft.Insights/dataCollectionRuleAssociations" and id has "Microsoft.Compute/virtualMachines"
+ | project idDcr = tolower(properties.dataCollectionRuleId), idVmDcr = tolower(substring(id, 0, indexof(id, "/providers/Microsoft.Insights/dataCollectionRuleAssociations/"))))
+on $left.idVm == $right.idVmDcr
+| join kind=leftouter (
+ Resources
+ | where type =~ "Microsoft.Insights/dataCollectionRules"
| extend
- VMId = toupper(substring(id, 0, indexof(id, '/extensions'))),
- ExtensionName = name
-) on $left.JoinID == $right.VMId
-| summarize param2 = strcat ("Extensions: ", make_list(ExtensionName)) by recommendationId="vm-20", name=vmName, id, tags=strcat(tags), param1=OSType
-| where param2 !contains "MicrosoftMonitoringAgent" and param2 !contains "OMSAgentforLinux" and param2 !contains "AzureMonitorWindowsAgent" and param2 !contains "AzureMonitorLinuxAgent"
-| order by tolower(name) asc
+ isPerformanceEnabled = iif(properties.dataSources.performanceCounters contains "Microsoft-InsightsMetrics" and properties.dataFlows contains "Microsoft-InsightsMetrics", true, false),
+ isMapEnabled = iif(properties.dataSources.extensions contains "Microsoft-ServiceMap" and properties.dataSources.extensions contains "DependencyAgent" and properties.dataFlows contains "Microsoft-ServiceMap", true, false)//,
+ | where isPerformanceEnabled or isMapEnabled
+ | project dcrName = name, isPerformanceEnabled, isMapEnabled, idDcr = tolower(id))
+on $left.idDcr == $right.idDcr
+| join kind=leftouter (
+ Resources
+ | where type == 'microsoft.compute/virtualmachines/extensions' and (name contains 'AzureMonitorWindowsAgent' or name contains 'AzureMonitorLinuxAgent')
+ | extend idVmExtension = tolower(substring(id, 0, indexof(id, '/extensions'))), extensionName = name)
+on $left.idVm == $right.idVmExtension
+| where isPerformanceEnabled != 1 or (extensionName != 'AzureMonitorWindowsAgent' and extensionName != 'AzureMonitorLinuxAgent')
+| project recommendationId = "vm-20", name, id = idVm, tags, param1 = strcat('MonitoringExtension:', extensionName), param2 = strcat('DataCollectionRuleId:', idDcr), param3 = strcat('isPerformanceEnabled:', isPerformanceEnabled)
diff --git a/docs/content/services/compute/virtual-machines/code/vm-23/vm-23.kql b/docs/content/services/compute/virtual-machines/code/vm-23/vm-23.kql
new file mode 100644
index 000000000..8d11a19a2
--- /dev/null
+++ b/docs/content/services/compute/virtual-machines/code/vm-23/vm-23.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Find all VMs using A or B series families
+resources
+| where type == 'microsoft.compute/virtualmachines'
+| where properties.hardwareProfile.vmSize contains "Standard_B" or properties.hardwareProfile.vmSize contains "Standard_A"
+| project recommendationId = "vm-23", name, id, tags, param1=strcat("vmSku: " , properties.hardwareProfile.vmSize)
diff --git a/docs/content/services/compute/virtual-machines/code/vm-24/vm-24.kql b/docs/content/services/compute/virtual-machines/code/vm-24/vm-24.kql
new file mode 100644
index 000000000..53e3a2d94
--- /dev/null
+++ b/docs/content/services/compute/virtual-machines/code/vm-24/vm-24.kql
@@ -0,0 +1,14 @@
+// Azure Resource Graph Query
+// Find all VMs that have an attached disk that is not in the Premium or Ultra sku tier.
+
+resources
+| where type =~ 'Microsoft.Compute/virtualMachines'
+| extend lname = tolower(name)
+| join kind=leftouter(resources
+ | where type =~ 'Microsoft.Compute/disks'
+ | where not(sku.tier =~ 'Premium') and not(sku.tier =~ 'Ultra')
+ | extend lname = tolower(tostring(split(managedBy, '/')[8]))
+ | project lname, name
+ | summarize disks = make_list(name) by lname) on lname
+| where isnotnull(disks)
+| project recommendationId = "vm-24", name, id, tags, param1=strcat("AffectedDisks: ", disks)
diff --git a/docs/content/services/networking/front-door/code/afd-16/afd-16.kql b/docs/content/services/compute/virtual-machines/code/vm-26/vm-26.kql
similarity index 100%
rename from docs/content/services/networking/front-door/code/afd-16/afd-16.kql
rename to docs/content/services/compute/virtual-machines/code/vm-26/vm-26.kql
diff --git a/docs/content/services/networking/front-door/code/afd-17/afd-17.kql b/docs/content/services/compute/virtual-machines/code/vm-27/vm-27.kql
similarity index 100%
rename from docs/content/services/networking/front-door/code/afd-17/afd-17.kql
rename to docs/content/services/compute/virtual-machines/code/vm-27/vm-27.kql
diff --git a/docs/content/services/networking/traffic-manager/code/traf-2/traf.kql b/docs/content/services/compute/virtual-machines/code/vm-28/vm-28.kql
similarity index 100%
rename from docs/content/services/networking/traffic-manager/code/traf-2/traf.kql
rename to docs/content/services/compute/virtual-machines/code/vm-28/vm-28.kql
diff --git a/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql b/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql
index 614a7f9ca..3c3ad23d2 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Find all VMs using Availability Sets
+resources
+| where type =~ 'Microsoft.Compute/virtualMachines'
+| where isnotnull(properties.availabilitySet)
+| project recommendationId = "vm-3", name, id, tags, param1=strcat("availabilitySet: ",properties.availabilitySet.id)
diff --git a/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql.fix b/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql.fix
deleted file mode 100644
index 713b4d38f..000000000
--- a/docs/content/services/compute/virtual-machines/code/vm-3/vm-3.kql.fix
+++ /dev/null
@@ -1,11 +0,0 @@
-// Azure Resource Graph Query
-// Find all Availability Sets with VMs with different name prefix. This query is to help identify different VM roles sharing the same AvailabilitySet.
-// Customize the query to meet your naming standards, replace the "5" with the number of characters you want to compare (name, 0, 5)
-Resources
-| where type =~ 'Microsoft.Compute/virtualMachines'
-| where isnotnull(properties.availabilitySet)
-| extend vmPrefix = substring(name, 0, 5)
-| summarize VMs = make_set(vmPrefix) by availabilitySet = tostring(properties.availabilitySet.id)
-| where array_length(VMs) > 1
-| extend availabilitySetName = tostring(split(availabilitySet, '/')[8])
-| project recommendationId = "vm-3", name=availabilitySetName, id="", param1=strcat("availabilitySet: ",availabilitySet), param2 = strcat("VMs :", VMs)
diff --git a/docs/content/services/compute/virtual-machines/code/vm-6/vm-6.kql b/docs/content/services/compute/virtual-machines/code/vm-6/vm-6.kql
index bdd1b6216..a349969b8 100644
--- a/docs/content/services/compute/virtual-machines/code/vm-6/vm-6.kql
+++ b/docs/content/services/compute/virtual-machines/code/vm-6/vm-6.kql
@@ -1,5 +1,5 @@
// Azure Resource Graph Query
-// Find all VMs that only have a single Disk
+// Find all VMs that only have OS Disk
Resources
| where type =~ 'Microsoft.Compute/virtualMachines'
| where array_length(properties.storageProfile.dataDisks) < 1
diff --git a/docs/content/services/container/aks/_index.md b/docs/content/services/container/aks/_index.md
index 74d1e3212..ddb6173fa 100644
--- a/docs/content/services/container/aks/_index.md
+++ b/docs/content/services/container/aks/_index.md
@@ -12,34 +12,32 @@ The presented resiliency recommendations in this guidance include Aks and associ
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :-------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [AKS-1 - Deploy AKS cluster across availability zones](#aks-1---deploy-aks-cluster-across-availability-zones) | High | Preview | Yes |
-| [AKS-2 - Isolate system pods](#aks-2---isolate-system-pods) | High | Preview | Yes |
-| [AKS-3 - Enable AKS-managed Azure AD integration](#aks-3---enable-aks-managed-azure-ad-integration) | High | Preview | Yes |
-| [AKS-4 - Configure Azure CNI networking for dynamic allocation of IPs](#aks-4---configure-azure-cni-networking-for-dynamic-allocation-of-ips) | Medium | Preview | Yes |
-| [AKS-5 - Enable the cluster autoscaler on an existing cluster](#aks-5---enable-the-cluster-autoscaler-on-an-existing-cluster) | High | Preview | Yes |
-| [AKS-6 - Plan for multi-region deployment](#aks-6---plan-for-multi-region-deployment) | High | Preview | No |
-| [AKS-7 - Back up Azure Kubernetes Service](#aks-7---back-up-azure-kubernetes-service) | Low | Preview | No |
-| [AKS-8 - Plan an AKS version upgrade](#aks-8---plan-an-aks-version-upgrade) | High | Preview | No |
-| [AKS-9 - Remediate AKS non-compliant Azure Policies](#aks-9---remediate-aks-non-compliant-azure-policies) | Low | Preview | No |
-| [AKS-10 - Deploy AKS across availability zones](#aks-10---deploy-aks-across-availability-zones) | High | Preview | Yes |
-| [AKS-11 - Ensure that Persistent Volumes in storage account are redundant for Pods with stateful applications](#aks-11---ensure-that-persistent-volumes-in-storage-account-are-redundant-for-pods-with-stateful-applications) | Low | Preview | No |
-| [AKS-12 - Disable Local Account Access to AKS](#aks-12---disable-local-account-access-to-aks) | High | Preview | Yes |
-| [AKS-13 - Remediate Azure Advisor recommendations](#aks-13---remediate-azure-advisor-recommendations) | High | Preview | No |
-| [AKS-14 - Upgrade Persistent Volumes with deprecated version to Azure CSI drivers](#aks-14---upgrade-persistent-volumes-with-deprecated-version-to-azure-csi-drivers) | High | Preview | No |
-| [AKS-15 - Implement Resource Quota to ensure that Kubernetes resources do not exceed hard resource limits.](#aks-15---implement-resource-quota-to-ensure-that-kubernetes-resources-do-not-exceed-hard-resource-limits) | Low | Preview | No |
-| [AKS-16 - Attach Virtual Nodes (ACI) to the AKS cluster](#aks-16---attach-virtual-nodes-aci-to-the-aks-cluster) | Low | Preview | No |
-| [AKS-17 - Isolate application (User) pods](#aks-17---isolate-application-user-pods) | Medium | Preview | Yes |
-| [AKS-18 - Enable AKS Monitor alerts](#aks-18---enable-aks-monitor-alerts) | High | Preview | No |
-| [AKS-19 - Update AKS tier to Standard](#aks-19---update-aks-tier-to-standard) | High | Preview | Yes |
-| [AKS-20 - Enable AKS Monitoring](#aks-20---enable-aks-monitoring) | High | Preview | Yes |
-| [AKS-21 - Use Ephemeral Disks on AKS clusters](#aks-21---use-ephemeral-disks-on-aks-clusters) | Medium | Preview | No |
-| [AKS-22 - Enable Azure Policies configured for AKS](#aks-22---enable-azure-policies-configured-for-aks) | Low | Preview | No |
-| [AKS-23 - Enable GitOps when using DevOps frameworks](#aks-23---enable-gitops-when-using-devops-frameworks) | Low | Preview | Yes |
-| [AKS-24 - Configure affinity or anti-affinity rules based on application requirements](#aks-24---configure-affinity-or-anti-affinity-rules-based-on-application-requirements) | High | Preview | No |
-| [AKS-25 - Configures Pods Liveness, Readiness, and Startup Probes](#aks-25---configures-pods-liveness-readiness-and-startup-probes) | High | Preview | No |
-| [AKS-26 - Configure Pod replication in production applications to guarantee availability](#aks-26---configure-pod-replication-in-production-applications-to-guarantee-availability) | High | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:-------:|:-------:|:-------------------:|
+| [AKS-1 - Deploy AKS cluster across availability zones](#aks-1---deploy-aks-cluster-across-availability-zones) | Availability | High | Preview | Yes |
+| [AKS-2 - Isolate system and application pods](#aks-2---isolate-system-and-application-pods) | Governance | High | Preview | Yes |
+| [AKS-3 - Disable local accounts](#aks-3---disable-local-accounts) | Access & Security | High | Preview | Yes |
+| [AKS-4 - Configure Azure CNI networking for dynamic allocation of IPs](#aks-4---configure-azure-cni-networking-for-dynamic-allocation-of-ips) | Networking | Medium | Preview | Yes |
+| [AKS-5 - Enable the cluster auto-scaler on an existing cluster](#aks-5---enable-the-cluster-auto-scaler-on-an-existing-cluster) | System Efficiency | High | Preview | Yes |
+| [AKS-6 - Back up Azure Kubernetes Service](#aks-6---back-up-azure-kubernetes-service) | Disaster Recovery | Low | Preview | No |
+| [AKS-7 - Plan an AKS version upgrade](#aks-7---plan-an-aks-version-upgrade) | Compliance | High | Preview | No |
+| [AKS-8 - Use zone-redundant storage for persistent volumes when running multi-zone AKS](#aks-8---use-zone-redundant-storage-for-persistent-volumes-when-running-multi-zone-aks) | Availability | Low | Verified | No |
+| [AKS-9 - Upgrade Persistent Volumes using in-tree drivers to Azure CSI drivers](#aks-9---upgrade-persistent-volumes-using-in-tree-drivers-to-azure-csi-drivers) | Storage | High | Verified | No |
+| [AKS-10 - Implement Resource Quota to ensure that Kubernetes resources do not exceed hard resource limits.](#aks-10---implement-resource-quota-to-ensure-that-kubernetes-resources-do-not-exceed-hard-resource-limits) | System Efficiency | Low | Preview | No |
+| [AKS-11 - Attach Virtual Nodes (ACI) to the AKS cluster](#aks-11---attach-virtual-nodes-aci-to-the-aks-cluster) | System Efficiency | Low | Preview | No |
+| [AKS-12 - Update AKS tier to Standard](#aks-12---update-aks-tier-to-standard) | Availability | High | Preview | Yes |
+| [AKS-13 - Enable AKS Monitoring](#aks-13---enable-aks-monitoring) | Monitoring | High | Preview | Yes |
+| [AKS-14 - Use Ephemeral OS disks on AKS clusters](#aks-14---use-ephemeral-os-disks-on-aks-clusters) | System Efficiency | Medium | Verified | No |
+| [AKS-15 - Enable and remediate Azure Policies configured for AKS](#aks-15---enable-and-remediate-azure-policies-configured-for-aks) | Governance | Low | Preview | No |
+| [AKS-16 - Enable GitOps when using DevOps frameworks](#aks-16---enable-gitops-when-using-devops-frameworks) | Automation | Low | Preview | Yes |
+| [AKS-17 - Configure affinity or anti-affinity rules based on application requirements](#aks-17---configure-affinity-or-anti-affinity-rules-based-on-application-requirements) | Availability | High | Preview | No |
+| [AKS-18 - Configures Pods Liveness, Readiness, and Startup Probes](#aks-18---configures-pods-liveness-readiness-and-startup-probes) | Availability | High | Preview | No |
+| [AKS-19 - Configure Pod replica sets in production applications to guarantee availability](#aks-19---configure-pod-replica-sets-in-production-applications-to-guarantee-availability) | Availability | High | Preview | No |
+| [AKS-20 - Configure system nodepool count](#aks-20---configure-system-nodepool-count) | Availability | High | Preview | Yes |
+| [AKS-21 - Configure user nodepool count](#aks-21---configure-user-nodepool-count) | Availability | High | Preview | Yes |
+| [AKS-22 - Configure pod disruption budgets (PDBs)](#aks-22---configure-pod-disruption-budgets-pdbs) | Availability | Medium | Preview | No |
+| [AKS-23 - Nodepool subnet size needs to accommodate maximum auto-scale settings](#aks-23---nodepool-subnet-size-needs-to-accommodate-maximum-auto-scale-settings) | Availability | High | Preview | Yes |
+| [AKS-24 - Enforce resource quotas at the namespace level](#aks-24---enforce-resource-quotas-at-the-namespace-level) | Availability | High | Preview | No |
{{< /table >}}
@@ -68,7 +66,7 @@ By deploying resources such as aks clusters, virtual machines, storage, and data
- [AKS Availability Zones](https://learn.microsoft.com/en-us/azure/aks/availability-zones)
- [Zone Balancing](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-use-availability-zones#zone-balancing)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -78,7 +76,7 @@ By deploying resources such as aks clusters, virtual machines, storage, and data
-### AKS-2 - Isolate system pods
+### AKS-2 - Isolate system and application pods
**Category: Governance**
@@ -94,7 +92,7 @@ To prevent misconfigured or rogue application pods from accidentally killing sys
- [System and user node pools](https://learn.microsoft.com/en-us/azure/aks/use-system-pools?tabs=azure-cli#system-and-user-node-pools)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -104,7 +102,7 @@ To prevent misconfigured or rogue application pods from accidentally killing sys
-### AKS-3 - Enable AKS-managed Azure AD integration
+### AKS-3 - Disable local accounts
**Category: Access & Security**
@@ -112,15 +110,15 @@ To prevent misconfigured or rogue application pods from accidentally killing sys
**Guidance**
-Enabling Azure AD integration on an AKS cluster provides several benefits for managing access to the cluster. By using Azure AD, you can centralize user and group management, enforce multi-factor authentication, and enable role-based access control (RBAC) for fine-grained access control to cluster resources. Additionally, Azure AD provides a secure and scalable authentication mechanism that can be integrated with other Azure services and third-party identity providers.
+Local Kubernetes accounts provide a legacy non-auditable means of accessing an AKS cluster and are not recommended for use. Enabling Microsoft Entra integration on an AKS cluster provides several benefits for managing access to the cluster. By using Microsoft Entra, you can centralize user and group management, enforce multi-factor authentication, and enable role-based access control (RBAC) for fine-grained access control to cluster resources. Additionally, Microsoft Entra provides a secure and scalable authentication mechanism that can be integrated with other Azure services and third-party identity providers.
**Resources**
-- [Azure AD integration](https://learn.microsoft.com/en-us/azure/aks/concepts-identity#azure-ad-integration)
+- [Entra integration](https://learn.microsoft.com/en-us/azure/aks/concepts-identity#azure-ad-integration)
- [Use Azure role-based access control for AKS](https://learn.microsoft.com/en-us/azure/aks/manage-azure-rbac?source=recommendations)
- [Manage AKS local accounts](https://learn.microsoft.com/en-us/azure/aks/manage-local-accounts-managed-azure-ad?source=recommendations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -145,7 +143,7 @@ The Azure CNI networking solution provides several benefits for managing IP addr
- [Configure Azure CNI networking](https://learn.microsoft.com/en-us/azure/aks/configure-azure-cni-dynamic-ip-allocation)
- [Configure Azure CNI Overlay networking](https://learn.microsoft.com/en-us/azure/aks/azure-cni-overlay)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -155,7 +153,7 @@ The Azure CNI networking solution provides several benefits for managing IP addr
-### AKS-5 - Enable the cluster autoscaler on an existing cluster
+### AKS-5 - Enable the cluster auto-scaler on an existing cluster
**Category: System Efficiency**
@@ -163,15 +161,22 @@ The Azure CNI networking solution provides several benefits for managing IP addr
**Guidance**
-AKS provides several options for scaling your cluster to meet changing demands. You can scale the number of nodes in a node pool manually or automatically based on metrics such as CPU utilization or custom metrics. You can also use virtual node scaling to add additional capacity to your cluster using Azure Container Instances. AKS also supports horizontal pod autoscaling, which automatically scales the number of pods in a deployment based on CPU utilization or custom metrics. Finally, AKS provides cluster autoscaling, which automatically scales the number of nodes in a node pool based on pod resource requests and the available capacity in the cluster. With these scaling options, you can ensure that your AKS cluster can handle varying workloads and optimize resource utilization.
+The cluster auto-scaler automatically scales the number of nodes in a node pool based on pod resource requests and the available capacity in the cluster. It helps ensure that the cluster can scale according to demand and prevent outages.
+
+If the cluster has availability zones enabled, the following configuration changes need to be verified or established:
+
+- Persistent Volumes - If the cluster is using persistent volumes backed by Azure Storage, ensure you have one nodepool per availability zone. Persistent volumes do not work across AZs and the auto-scaler could fail to create new pods if the nodepool cannot access the persistent volume.
+- Multiple Nodepools per Zone - If the cluster has multiple nodepools per AZ, enable the `--balance-similar-node-groups` property through the auto-scaler profile. This feature detects similar nodepools and balances the number of nodes across them.
+
**Resources**
-- [Best practices for advanced scheduler features](https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-advanced-scheduler)
-- [Node pool scaling considerations and best practices](https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-run-at-scale#node-pool-scaling-considerations-and-best-practices)
-- [Best practices for basic scheduler features](https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-scheduler)
+- [Use the Cluster Autoscaler on AKS](https://learn.microsoft.com/azure/aks/cluster-autoscaler?tabs=azure-cli)
+- [Best practices for advanced scheduler features](https://learn.microsoft.com/azure/aks/operator-best-practices-advanced-scheduler)
+- [Node pool scaling considerations and best practices](https://learn.microsoft.com/azure/aks/best-practices-performance-scale-large#node-pool-scaling)
+- [Best practices for basic scheduler features](https://learn.microsoft.com/azure/aks/operator-best-practices-scheduler)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -181,23 +186,22 @@ AKS provides several options for scaling your cluster to meet changing demands.
-### AKS-6 - Plan for multi-region deployment
+### AKS-6 - Back up Azure Kubernetes Service
**Category: Disaster Recovery**
-**Impact: High**
+**Impact: Low**
**Guidance**
-An AKS cluster is deployed into a single region. To protect your system from region failure, deploy your application into multiple AKS clusters across different regions. When deploying multiple Kubernetes clusters in highly available and geographically distributed configurations, it's essential to consider the sum of each Kubernetes cluster as a coupled unit. You might want to develop code-driven strategies for automated deployment and configuration to ensure that each Kubernetes instance is as identical as possible.
+AKS is increasingly being used for stateful applications that require a backup strategy. Azure Backup now allows you to back up AKS clusters (cluster resources and persistent volumes attached to the cluster) using a backup extension, which must be installed in the cluster. Backup vault communicates with the cluster via this Backup Extension to perform backup and restore operations."
**Resources**
-- [Plan for multiregion deployment](https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-multi-region#plan-for-multiregion-deployment)
-- [Cluster deployment and bootstrapping](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-multi-region/aks-multi-cluster#cluster-deployment-and-bootstrapping)
-- [AKS baseline for multiregion clusters](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-multi-region/aks-multi-cluster)
+- [AKS Backups](https://learn.microsoft.com/en-us/azure/backup/azure-kubernetes-service-cluster-backup)
+- [Best Practices for AKS Backups](https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-storage)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -207,22 +211,24 @@ An AKS cluster is deployed into a single region. To protect your system from reg
-### AKS-7 - Back up Azure Kubernetes Service
+### AKS-7 - Plan an AKS version upgrade
-**Category: Disaster Recovery**
+**Category: Compliance**
-**Impact: Low**
+**Impact: High**
**Guidance**
-AKS is increasingly being used for stateful applications that require a backup strategy. Azure Backup now allows you to back up AKS clusters (cluster resources and persistent volumes attached to the cluster) using a backup extension, which must be installed in the cluster. Backup vault communicates with the cluster via this Backup Extension to perform backup and restore operations."
+Minor version releases include new features and improvements. Patch releases are more frequent (sometimes weekly) and are intended for critical bug fixes within a minor version. Patch releases include fixes for security vulnerabilities or major bugs.
+If you're running an unsupported Kubernetes version, you'll be asked to upgrade when requesting support for the cluster. Clusters running unsupported Kubernetes releases aren't covered by the AKS support policies.
**Resources**
-- [AKS Backups](https://learn.microsoft.com/en-us/azure/backup/azure-kubernetes-service-cluster-backup)
-- [Best Practices for AKS Backups](https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-storage)
+- [Updating to the latest AKS version](https://learn.microsoft.com/azure/aks/operator-best-practices-cluster-security?tabs=azure-cli#regularly-update-to-the-latest-version-of-kubernetes)
+- [Upgrade cluster](https://learn.microsoft.com/azure/aks/upgrade-cluster?tabs=azure-cli)
+- [Auto-upgrading cluster](https://learn.microsoft.com/azure/aks/auto-upgrade-cluster)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -232,25 +238,33 @@ AKS is increasingly being used for stateful applications that require a backup s
-### AKS-8 - Plan an AKS version upgrade
+### AKS-8 - Use zone-redundant storage for persistent volumes when running multi-zone AKS
-**Category: Compliance**
+**Category: Availability**
-**Impact: High**
+**Impact: Low**
**Guidance**
-Minor version releases include new features and improvements. Patch releases are more frequent (sometimes weekly) and are intended for critical bug fixes within a minor version. Patch releases include fixes for security vulnerabilities or major bugs.
-If you're running an unsupported Kubernetes version, you'll be asked to upgrade when requesting support for the cluster. Clusters running unsupported Kubernetes releases aren't covered by the AKS support policies.
+For applications that need replication of data across availability zones to protect against zonal outages, customers should leverage zone-redundant storage (ZRS) with multi-zone AKS clusters. ZRS replicates data synchronously across three Azure availability zones in the primary region.
+
+- Azure Disks: Use ZRS disks by setting the disk SKU to StandardSSD_ZRS or Premium_ZRS in a storage class. Also, starting from AKS v1.29 onward, multi-zone AKS clusters will have default storage classes that use ZRS disks.
+- Azure Container Storage: Customers can leverage ZRS disks in Azure Container Storage by creating a storage pool and specifying StandardSSD_ZRS or Premium_ZRS as the SKU. Customers can also create a multi-zone storage pool where the total storage capacity will be distributed across zones.
+- Azure Files: Use ZRS files by setting the SKU to Standard_ZRS or Premium_ZRS in a storage class.
+- Azure Blob: Use ZRS blob by setting the SKU to Standard_ZRS or Premium_ZRS in a storage class.
**Resources**
-- [Updating to the latest AKS version](https://learn.microsoft.com/azure/aks/operator-best-practices-cluster-security?tabs=azure-cli#regularly-update-to-the-latest-version-of-kubernetes)
-- [Upgrade cluster](https://learn.microsoft.com/azure/aks/upgrade-cluster?tabs=azure-cli)
-- [Auto-upgrading cluster](https://learn.microsoft.com/azure/aks/auto-upgrade-cluster)
+- [Availability zones overview](https://learn.microsoft.com/azure/reliability/availability-zones-overview?tabs=azure-cli)
+- [Zone-redundant storage](https://learn.microsoft.com/azure/storage/common/storage-redundancy#zone-redundant-storage)
+- [ZRS disks](https://learn.microsoft.com/azure/virtual-machines/disks-redundancy#zone-redundant-storage-for-managed-disks)
+- [Convert a disk from LRS to ZRS](https://learn.microsoft.com/azure/virtual-machines/disks-migrate-lrs-zrs)
+- [Enable multi-zone storage redundancy in Azure Container Storage](https://learn.microsoft.com/azure/storage/container-storage/enable-multi-zone-redundancy)
+- [ZRS files](https://learn.microsoft.com/azure/storage/files/files-redundancy#zone-redundant-storage)
+- [Change the redundancy configuration for a storage account](https://learn.microsoft.com/azure/storage/common/redundancy-migration)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -260,23 +274,22 @@ If you're running an unsupported Kubernetes version, you'll be asked to upgrade
-### AKS-9 - Remediate AKS non-compliant Azure Policies
+### AKS-9 - Upgrade Persistent Volumes using in-tree drivers to Azure CSI drivers
-**Category: Compliance**
+**Category: Storage**
-**Impact: Low**
+**Impact: High**
**Guidance**
-Azure Policy helps manage the compliance state of your Kubernetes clusters, enforces organizational standards, has built-in security policies and assesses compliance at-scale. To prevent outages due to deprecations of cluster or API versions or non-compliance issues that could lead to disabling of resources, you need to ensure that your AKS clusters are in compliance with all applicable Azure policies.
+From Kubernetes version 1.26 onward, Azure Disk and Azure File in-tree drivers are no longer supported (persistent volume types with the provisioners kubernetes.io/azure-disk and kubernetes.io/azure-file), due to the deprecation of in-tree storage drivers by the Kubernetes Community. Azure Storage is now provided by the Azure Disk and File CSI drivers. While existing deployments using the in-tree drivers are not expected to break, these are no longer tested and customers should update them to use the CSI drivers. Also, to leverage new storage capabilities (new SKUs, features, etc.), customers should be using the CSI drivers.
**Resources**
-- [Policy for Kubernetes](https://learn.microsoft.com/azure/governance/policy/concepts/policy-for-kubernetes)
-- [Governance with Azure Policy](https://learn.microsoft.com/azure/aks/use-azure-policy?toc=%2Fazure%2Fgovernance%2Fpolicy%2Ftoc.json&bc=%2Fazure%2Fgovernance%2Fpolicy%2Fbreadcrumb%2Ftoc.json)
-
+- [CSI Storage Drivers](https://learn.microsoft.com/azure/aks/csi-storage-drivers)
+- [CSI Migrate in Tree Volumes](https://learn.microsoft.com/azure/aks/csi-migrate-in-tree-volumes)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -286,23 +299,21 @@ Azure Policy helps manage the compliance state of your Kubernetes clusters, enfo
-### AKS-10 - Deploy AKS across availability zones
+### AKS-10 - Implement Resource Quota to ensure that Kubernetes resources do not exceed hard resource limits
-**Category: High Availability**
+**Category: System Efficiency**
-**Impact: High**
+**Impact: Low**
**Guidance**
-When you create your cluster, use availability zones to protect your applications and data against unlikely data center failure.
+A resource quota, defined by a ResourceQuota object, provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that namespace.
**Resources**
-- [Availability Zones](https://learn.microsoft.com/azure/aks/availability-zones)
-- [Best Practices for multi-region AKS](https://learn.microsoft.com/azure/aks/operator-best-practices-multi-region?source=recommendations)
-
+- [Resource Quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -312,28 +323,26 @@ When you create your cluster, use availability zones to protect your application
-### AKS-11 - Ensure that Persistent Volumes in storage account are redundant for Pods with stateful applications
+### AKS-11 - Attach Virtual Nodes (ACI) to the AKS cluster
-
-**Category: High Availability**
+**Category: System Efficiency**
**Impact: Low**
**Guidance**
-Data in an Azure Storage account is always replicated three times in the primary region. Azure Storage for Persistent Volumes offers other options for how your data is replicated in the primary or paired region:
-- LRS synchronously replicates data 3 times in single physical location. It is least expensive replication but not recommended for apps with high availability and durability. LRS provides eleven 9 durability.
-- ZRS copies data synchronously across 3 availability zone in primary region. ZRS is recommended for apps requiring high availability across zones. ZRS provides twelve 9s durability.
+To rapidly scale application workloads in an AKS cluster, you can use virtual nodes. With virtual nodes, pods provision much faster than through the Kubernetes cluster auto-scaler.
+
+If the cluster has availability zones enabled, the following configuration changes need to be verified or established:
-In AKS Premium_ZRS and StandardSSD_ZRS disk types are supported. ZRS disk could be scheduled on the zone or non-zone node, without the restriction that disk volume should be co-located in the same zone as a given node.
+- Persistent Volumes - If the cluster is using persistent volumes backed by Azure Storage, ensure you have one nodepool per availability zone. Persistent volumes do not work across AZs and the auto-scaler could fail to create new pods if the nodepool cannot access the persistent volume.
**Resources**
-- [Azure Disk CSI Driver](https://learn.microsoft.com/azure/aks/azure-disk-csi#azure-disk-csi-driver-features)
-- [Virtual Machine Disk Redundancy](https://learn.microsoft.com/azure/virtual-machines/disks-redundancy)
-
+- [Virtual Nodes](https://learn.microsoft.com/azure/aks/virtual-nodes)
+- [Azure Container Instances](https://learn.microsoft.com/azure/container-instances/container-instances-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -343,24 +352,22 @@ In AKS Premium_ZRS and StandardSSD_ZRS disk types are supported. ZRS disk could
-### AKS-12 - Disable Local Account Access to AKS
-
+### AKS-12 - Update AKS tier to Standard
-**Category: Identity**
+**Category: Availability**
**Impact: High**
**Guidance**
-Local accounts provide a legacy non-auditable means of accessing an AKS cluster and are not recommended for use.
+Production AKS clusters should be configured with the Standard tier. The AKS free service doesn't offer a financially backed SLA and node scalability is limited. To obtain that SLA, Standard tier must be selected.
**Resources**
-- [Manage Local Accounts with Azure AD](https://learn.microsoft.com/azure/aks/manage-local-accounts-managed-azure-ad)
-- [Managed Azure AD with AKS](https://learn.microsoft.com/azure/aks/managed-azure-ad)
-
+- [Pricing Tiers](https://learn.microsoft.com/en-us/azure/aks/free-standard-pricing-tiers)
+- [AKS Baseline Architecture](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks/baseline-aks?toc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Faks%2Ftoc.json&bc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Fbread%2Ftoc.json#kubernetes-api-server-sla)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -370,8 +377,7 @@ Local accounts provide a legacy non-auditable means of accessing an AKS cluster
-### AKS-13 - Remediate Azure Advisor recommendations
-
+### AKS-13 - Enable AKS Monitoring
**Category: Monitoring**
@@ -379,13 +385,13 @@ Local accounts provide a legacy non-auditable means of accessing an AKS cluster
**Guidance**
-Azure Advisor can recommend solutions that will help improve the performance, high availability, and security of your AKS cluster by analyzing your AKS configuration and usage telemetry
+Azure Monitor collects events, captures container logs, collects CPU/Memory information from the Metrics API and allows the visualization of the data, to validate the near real time health and performance of AKS environments. The visualization tool can be Azure Monitor Container Insights, Prometheus, Grafana or others.
**Resources**
-- [Getting Started with Azure Advisor](https://learn.microsoft.com/en-us/azure/advisor/advisor-get-started)
+- [Monitor AKS](https://learn.microsoft.com/azure/aks/monitor-aks)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -395,23 +401,24 @@ Azure Advisor can recommend solutions that will help improve the performance, hi
-### AKS-14 - Upgrade Persistent Volumes with deprecated version to Azure CSI drivers
-
+### AKS-14 - Use Ephemeral OS disks on AKS clusters
-**Category: Storage**
+**Category: System Efficiency**
-**Impact: High**
+**Impact: Medium**
**Guidance**
-Starting with Kubernetes version 1.26, in-tree persistent volume types kubernetes.io/azure-disk and kubernetes.io/azure-file are deprecated and will no longer be supported. Removing these drivers following their deprecation is not planned, however you should migrate to the corresponding CSI drivers disks.csi.azure.com and file.csi.azure.com.
+Ephemeral disks are ideal as OS disks for stateless applications since they provide better performance and improved reliability by decreasing IO incidents. Additionally, customers won’t incur additional storage costs for the OS, and they can get faster cluster operations like scale or upgrade thanks to faster re-imaging and boot times. AKS will default to using an ephemeral disk as the OS disk if it’s available for the VM SKU selected for node pools if customers don’t explicitly request an Azure managed disk for the OS.
**Resources**
-- [CSI Storage Drivers](https://learn.microsoft.com/en-us/azure/aks/csi-storage-drivers)
-- [CSI Migrate in Tree Volumes](https://learn.microsoft.com/azure/aks/csi-migrate-in-tree-volumes)
+- [Ephemeral OS disk](https://learn.microsoft.com/azure/aks/concepts-storage#ephemeral-os-disk)
+- [Configure an AKS cluster](https://learn.microsoft.com/azure/aks/cluster-configuration)
+- [Everything you want to know about ephemeral OS disks and AKS](https://learn.microsoft.com/samples/azure-samples/aks-ephemeral-os-disk/aks-ephemeral-os-disk/)
-**Resource Graph Query/Scripts**
+
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -421,22 +428,21 @@ Starting with Kubernetes version 1.26, in-tree persistent volume types kubernete
-### AKS-15 - Implement Resource Quota to ensure that Kubernetes resources do not exceed hard resource limits
-
+### AKS-15 - Enable and remediate Azure Policies configured for AKS
-**Category: System Efficiency**
+**Category: Governance**
**Impact: Low**
**Guidance**
-
-A resource quota, defined by a ResourceQuota object, provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that namespace.
+Azure Policies allow companies to enforce governance best practices in the AKS cluster around security, authentication, provisioning, networking and others.
**Resources**
-- [Resource Quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/)
+- [AKS Baseline - Policy Management](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks/baseline-aks?toc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Faks%2Ftoc.json&bc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Fbread%2Ftoc.json#policy-management)
+- [Built-in Policy Definitions for AKS](https://learn.microsoft.com/en-us/azure/aks/policy-reference)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -446,23 +452,22 @@ A resource quota, defined by a ResourceQuota object, provides constraints that l
-### AKS-16 - Attach Virtual Nodes (ACI) to the AKS cluster
+### AKS-16 - Enable GitOps when using DevOps frameworks
-
-**Category: Scalability**
+**Category: Automation**
**Impact: Low**
**Guidance**
-To rapidly scale application workloads in an AKS cluster, you can use virtual nodes. With virtual nodes, pods provision much faster than through the Kubernetes cluster auto-scaler.
+GitOps is an operating model for cloud-native applications that stores application and declarative infrastructure code in Git to be used as the source of truth for automated continuous delivery. With GitOps, you describe the desired state of your entire system in a git repository, and a GitOps operator deploys it to your environment, which is often a Kubernetes cluster. To prevent potential outages or unsuccessful failover scenarios, GitOps helps maintain the configuration of all AKS clusters to the intended configuration.
**Resources**
-- [Virtual Nodes](https://learn.microsoft.com/azure/aks/virtual-nodes)
-- [Azure Container Instances](https://learn.microsoft.com/azure/container-instances/container-instances-overview)
+- [GitOps with AKS](https://learn.microsoft.com/en-us/azure/architecture/guide/aks/aks-cicd-github-actions-and-gitops)
+- [GitOps for AKS - Reference Architecture](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/gitops-aks/gitops-blueprint-aks)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -472,24 +477,22 @@ To rapidly scale application workloads in an AKS cluster, you can use virtual no
-### AKS-17 - Isolate application (User) pods
+### AKS-17 - Configure affinity or anti-affinity rules based on application requirements
+**Category: Availability**
-**Category: Governance**
-
-**Impact: Medium**
+**Impact: High**
**Guidance**
-Isolate critical system pods from your application pods to prevent misconfigured or rogue application pods from accidentally killing system pods.
-
+Configure Topology Spread Constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.
**Resources**
-- [System and User nodepools](https://learn.microsoft.com/azure/aks/virtual-nodes)
-- [Using System nodepools](https://learn.microsoft.com/azure/aks/use-system-pools?tabs=azure-cli)
+- [Topology Spread Constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/)
+- [Assign Pod Node](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -499,22 +502,22 @@ Isolate critical system pods from your application pods to prevent misconfigured
-### AKS-18 - Enable AKS Monitor alerts
+### AKS-18 - Configures Pods Liveness, Readiness, and Startup Probes
-**Category: Monitoring**
+**Category: Availability**
**Impact: High**
**Guidance**
-Alerts help you detect and address issues before users notice them by proactively notifying you when Azure Monitor data indicates there might be a problem with your infrastructure or application. Set up monitoring and alerts for AKS health based on various metrics available.
+AKS kubelet controller uses liveness probes to validate containers and applications health. Based on containers health, kubelet will know when to restart a container.
**Resources**
-- [AKS Monitor - AKS Alerts](https://learn.microsoft.com/azure/aks/monitor-aks#alerts)
-- [How to create a new alert rule](https://learn.microsoft.com/azure/azure-monitor/alerts/alerts-create-new-alert-rule?tabs=metric)
+- [Configure probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
+- [Assign Pod Node](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -524,22 +527,21 @@ Alerts help you detect and address issues before users notice them by proactivel
-### AKS-19 - Update AKS tier to Standard
+### AKS-19 - Configure pod replica sets in production applications to guarantee availability
-**Category: Resiliency**
+**Category: Availability**
**Impact: High**
**Guidance**
-Production AKS clusters should be configured with the Standard tier. The AKS free service doesn't offer a financially backed SLA and node scalability is limited. To obtain that SLA, Standard tier must be selected.
+Configure ReplicaSets in the Pod or Deployment manifests to maintain a stable set of replica Pods running at any given time. This feature will guarantee the availability of a specified number of identical Pods.
**Resources**
-- [Pricing Tiers](https://learn.microsoft.com/en-us/azure/aks/free-standard-pricing-tiers)
-- [AKS Baseline Architecture](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks/baseline-aks?toc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Faks%2Ftoc.json&bc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Fbread%2Ftoc.json#kubernetes-api-server-sla)
+- [Replica Sets](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -549,21 +551,21 @@ Production AKS clusters should be configured with the Standard tier. The AKS fre
-### AKS-20 - Enable AKS Monitoring
+### AKS-20 - Configure system nodepool count
-**Category: Monitoring**
+**Category: Availability**
**Impact: High**
**Guidance**
-Azure Monitor collects events, captures container logs, collects CPU/Memory information from Metrics API and allows the visualization of the data, to validate the near real time health and performance of AKS environments. The visualization tool can be Azure Monitor Container Insights, Prometheus, Grafana or others.
+The system node pool should be configured with a minimum node count of two to ensure critical system pods are resilient to node outages.
**Resources**
-- [Monitor AKS](https://learn.microsoft.com/azure/aks/monitor-aks)
+- [System nodepools](https://learn.microsoft.com/azure/aks/use-system-pools?tabs=azure-cli)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -573,21 +575,21 @@ Azure Monitor collects events, captures container logs, collects CPU/Memory info
-### AKS-21 - Use Ephemeral Disks on AKS clusters
+### AKS-21 - Configure user nodepool count
-**Category: Performance**
+**Category: Availability**
-**Impact: Medium**
+**Impact: High**
**Guidance**
-Ephemeral OS disks provide lower read/write latency on the OS disk of AKS agent nodes since the disk is locally attached, and it is not replicated as managed disks. You will also get faster cluster operations like scale or upgrade thanks to faster re-imaging and boot times.
+The user node pool should be configured with a minimum node count of two if the application requires high availability.
**Resources**
-- [AKS Ephemeral OS Disk](https://learn.microsoft.com/samples/azure-samples/aks-ephemeral-os-disk/aks-ephemeral-os-disk/)
+- [Azure Well-Architected Framework review for Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/well-architected/service-guides/azure-kubernetes-service#design-checklist)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -597,22 +599,22 @@ Ephemeral OS disks provide lower read/write latency on the OS disk of AKS agent
-### AKS-22 - Enable Azure Policies configured for AKS
+### AKS-22 - Configure pod disruption budgets (PDBs)
-**Category: Governance**
+**Category: Availability**
-**Impact: Low**
+**Impact: Medium**
**Guidance**
-Azure Policies allow companies to enforce governance best practices in the AKS cluster around security, authentication, provisioning, networking and others.
+A Pod Disruption Budget (PDB) is a Kubernetes resource that allows you to configure the minimum number or percentage of pods that should remain available during voluntary disruptions, such as maintenance or scaling events. To maintain the availability of applications, define Pod Disruption Budgets (PDBs) to make sure that a minimum number of pods are available in the cluster.
**Resources**
-- [AKS Baseline - Policy Management](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks/baseline-aks?toc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Faks%2Ftoc.json&bc=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fazure%2Fbread%2Ftoc.json#policy-management)
-- [Built-in Policy Definitions for AKS](https://learn.microsoft.com/en-us/azure/aks/policy-reference)
+- [Configure PDBs](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
+- [Plan availability using PDBs](https://learn.microsoft.com/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -622,22 +624,21 @@ Azure Policies allow companies to enforce governance best practices in the AKS c
-### AKS-23 - Enable GitOps when using DevOps frameworks
+### AKS-23 - Nodepool subnet size needs to accommodate maximum auto-scale settings
-**Category: Automation**
+**Category: Availability**
-**Impact: Low**
+**Impact: High**
**Guidance**
-GitOps is an operating model for cloud-native applications that stores application and declarative infrastructure code in Git to be used as the source of truth for automated continuous delivery. With GitOps, you describe the desired state of your entire system in a git repository, and a GitOps operator deploys it to your environment, which is often a Kubernetes cluster. To prevent potential outages or unsuccessful failover scenarios, GitOps helps maintain the configuration of all AKS clusters to the intended configuration.
+Nodepool subnets should be sized to accommodate maximum auto-scale settings. By properly sizing the subnet, AKS can efficiently scale out nodes to meet increased demand, reducing the risk of resource constraints and potential service disruptions.
**Resources**
-- [GitOps with AKS](https://learn.microsoft.com/en-us/azure/architecture/guide/aks/aks-cicd-github-actions-and-gitops)
-- [GitOps for AKS - Reference Architecture](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/gitops-aks/gitops-blueprint-aks)
+- [AKS Networking](https://learn.microsoft.com/azure/aks/concepts-network)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -647,22 +648,21 @@ GitOps is an operating model for cloud-native applications that stores applicati
-### AKS-24 - Configure affinity or anti-affinity rules based on application requirements
+### AKS-24 - Enforce resource quotas at the namespace level
-**Category: High Availability**
+**Category: Availability**
**Impact: High**
**Guidance**
-Configure Topology Spread Constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.
+Enforcing namespace-level resource quotas is crucial for ensuring reliability by preventing resource exhaustion and maintaining cluster stability. This helps prevent individual applications or users from monopolizing resources, which can lead to degraded performance or outages for other applications in the cluster.
**Resources**
-- [Topology Spread Constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/)
-- [Assign Pod Node](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
+- [Resource quotas](https://learn.microsoft.com/azure/aks/operator-best-practices-scheduler#enforce-resource-quotas)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -671,52 +671,3 @@ Configure Topology Spread Constraints to control how Pods are spread across your
{{< /collapse >}}
-
-### AKS-25 - Configures Pods Liveness, Readiness, and Startup Probes
-
-**Category: High Availability**
-
-**Impact: High**
-
-**Guidance**
-
-AKS kubelet controller uses liveness probes to validate containers and applications health. Based on containers health, kubelet will know when to restart a container.
-
-**Resources**
-
-- [Configure probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
-- [Assign Pod Node](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/aks-25/aks-25.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### AKS-26 - Configure pod replication in production applications to guarantee availability
-
-**Category: High Availability**
-
-**Impact: High**
-
-**Guidance**
-
-Configure ReplicaSets in the Pod or Deployment manifests to maintain a stable set of replica Pods running at any given time. This feature will guarantee the availability of a specified number of identical Pods.
-
-**Resources**
-
-- [Replica Sets](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/aks-26/aks-26.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
diff --git a/docs/content/services/container/aks/code/aks-1/aks-1.kql b/docs/content/services/container/aks/code/aks-1/aks-1.kql
index 9c47d9008..827104bae 100644
--- a/docs/content/services/container/aks/code/aks-1/aks-1.kql
+++ b/docs/content/services/container/aks/code/aks-1/aks-1.kql
@@ -1,7 +1,8 @@
// Azure Resource Graph Query
-// Query AKS clusters not using zones
+// Returns AKS clusters that do not have any availability zones enabled
resources
-| where type == "microsoft.containerservice/managedclusters"
-| extend zones = tostring(parse_json(properties.agentPoolProfiles[0].availabilityZones))
-| where isempty(zones)
-| project recommendationid="aks-1", name, id, tags, param1=strcat("zones: ", zones)
+| where type == 'microsoft.containerservice/managedclusters'
+| project id, name, location, properties.agentPoolProfiles
+| mv-expand properties_agentPoolProfiles
+| where isempty(array_length(properties_agentPoolProfiles.availabilityZones))
+| project recommendationId="aks-1", id, name, tags, param1=strcat("nodePoolName: ", properties_agentPoolProfiles.name), param2=strcat("orchestratorVersion: ", properties_agentPoolProfiles.orchestratorVersion), param3=strcat("currentOrchestratorVersion: ", properties_agentPoolProfiles.currentOrchestratorVersion), param4=strcat("numberOfZones: ", iff(isempty(array_length(properties_agentPoolProfiles.availabilityZones)), 0, array_length(properties_agentPoolProfiles.availabilityZones)))
diff --git a/docs/content/services/container/aks/code/aks-10/aks-10.kql b/docs/content/services/container/aks/code/aks-10/aks-10.kql
index 06df3631a..fa5cad258 100644
--- a/docs/content/services/container/aks/code/aks-10/aks-10.kql
+++ b/docs/content/services/container/aks/code/aks-10/aks-10.kql
@@ -1,8 +1 @@
-// Azure Resource Graph Query
-// Returns AKS clusters that do not have any availability zones enabled
-resources
-| where type == 'microsoft.containerservice/managedclusters'
-| project id, name, location, properties.agentPoolProfiles
-| mv-expand properties_agentPoolProfiles
-| where isempty(array_length(properties_agentPoolProfiles.availabilityZones))
-| project recommendationId="aks-10", id, name, param1=strcat("nodePoolName: ", properties_agentPoolProfiles.name), param2=strcat("orchestratorVersion: ", properties_agentPoolProfiles.orchestratorVersion), param3=strcat("currentOrchestratorVersion: ", properties_agentPoolProfiles.currentOrchestratorVersion), param4=strcat("numberOfZones: ", iff(isempty(array_length(properties_agentPoolProfiles.availabilityZones)), 0, array_length(properties_agentPoolProfiles.availabilityZones)))
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/container/aks/code/aks-12/aks-12.kql b/docs/content/services/container/aks/code/aks-12/aks-12.kql
index 1254bd127..f8fce7f24 100644
--- a/docs/content/services/container/aks/code/aks-12/aks-12.kql
+++ b/docs/content/services/container/aks/code/aks-12/aks-12.kql
@@ -1,7 +1,6 @@
// Azure Resource Graph Query
-// Returns AKS clusters where local accounts are not disabled
+// Returns all AKS clusters not running on the Standard tier
resources
-| where type == "microsoft.containerservice/managedclusters"
-| extend disableLocalAccounts = tostring (parse_json(properties.disableLocalAccounts))
-| where disableLocalAccounts == "false"
-| project recommendationId="aks-12", id, name, param1=strcat("localAccountsDisabled: ", disableLocalAccounts)
+| where type == "microsoft.containerservice/managedclusters"
+| where sku.tier != "Standard"
+| project recommendationId="aks-12", id, name, tags, param1=strcat("skuName: ", sku.name), param2=strcat("skuTier: ", sku.tier)
diff --git a/docs/content/services/container/aks/code/aks-13/aks-13.kql b/docs/content/services/container/aks/code/aks-13/aks-13.kql
index fa5cad258..390a52e4f 100644
--- a/docs/content/services/container/aks/code/aks-13/aks-13.kql
+++ b/docs/content/services/container/aks/code/aks-13/aks-13.kql
@@ -1 +1,8 @@
-// cannot-be-validated-with-arg
+// Azure Resource Graph Query
+// Returns AKS clusters where either Azure Monitor is not enabled and/or Container Insights is not enabled
+resources
+| where type == "microsoft.containerservice/managedclusters"
+| extend azureMonitor = tostring(parse_json(properties.azureMonitorProfile.metrics.enabled))
+| extend insights = tostring(parse_json(properties.addonProfiles.omsagent.enabled))
+| where isempty(azureMonitor) or isempty(insights)
+| project recommendationId="aks-13",id, name, tags, param1=strcat("azureMonitorProfileEnabled: ", iff(isempty(azureMonitor), "false", azureMonitor)), param2=strcat("containerInsightsEnabled: ", iff(isempty(insights), "false", insights))
diff --git a/docs/content/services/container/aks/code/aks-16/aks-16.kql b/docs/content/services/container/aks/code/aks-16/aks-16.kql
index fa5cad258..386ef6a92 100644
--- a/docs/content/services/container/aks/code/aks-16/aks-16.kql
+++ b/docs/content/services/container/aks/code/aks-16/aks-16.kql
@@ -1 +1,7 @@
-// cannot-be-validated-with-arg
+// Azure Resource Graph Query
+// Returns AKS clusters where GitOps is not enabled
+resources
+| where type == "microsoft.containerservice/managedclusters"
+| extend gitops = tostring (parse_json(properties.addOnProfiles.gitops.enabled))
+| where isempty(gitops)
+| project recommendationId="aks-16", id, name, tags, param1=strcat("gitopsEnabled: ", "false")
diff --git a/docs/content/services/container/aks/code/aks-17/aks-17.kql b/docs/content/services/container/aks/code/aks-17/aks-17.kql
index b9ea31544..fa5cad258 100644
--- a/docs/content/services/container/aks/code/aks-17/aks-17.kql
+++ b/docs/content/services/container/aks/code/aks-17/aks-17.kql
@@ -1,9 +1 @@
-// Azure Resource Graph Query
-// Returns each AKS cluster with nodepools that do not have taints set
-resources
-| where type == "microsoft.containerservice/managedclusters"
-| mv-expand agentPoolProfile = properties.agentPoolProfiles
-| extend taint = tostring(parse_json(agentPoolProfile.nodeTaints))
-| extend nodePool = tostring(parse_json(agentPoolProfile.name))
-| where isempty(taint)
-| project recommendationid="aks-17", id, name, param1=strcat("nodepoolName: ", nodePool), param2=strcat("taint: ", iff(isempty(taint), "None", taint))
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/container/aks/code/aks-19/aks-19.kql b/docs/content/services/container/aks/code/aks-19/aks-19.kql
index 349fd7c8d..fa5cad258 100644
--- a/docs/content/services/container/aks/code/aks-19/aks-19.kql
+++ b/docs/content/services/container/aks/code/aks-19/aks-19.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Returns all AKS clusters not running on the Standard tier
-resources
-| where type == "microsoft.containerservice/managedclusters"
-| where sku.tier != "Standard"
-| project recommendationId="aks-19", id, name, param1=strcat("skuName: ", sku.name), param2=strcat("skuTier: ", sku.tier)
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/container/aks/code/aks-2/aks-2.kql b/docs/content/services/container/aks/code/aks-2/aks-2.kql
index 3e66d8262..5d12912fd 100644
--- a/docs/content/services/container/aks/code/aks-2/aks-2.kql
+++ b/docs/content/services/container/aks/code/aks-2/aks-2.kql
@@ -1,7 +1,9 @@
// Azure Resource Graph Query
-// Find AKS clusters not using taints.
+// Returns each AKS cluster with nodepools that do not have taints set
resources
| where type == "microsoft.containerservice/managedclusters"
-| extend taint = tostring(parse_json(properties.agentPoolProfiles[1].nodeTaints))
+| mv-expand agentPoolProfile = properties.agentPoolProfiles
+| extend taint = tostring(parse_json(agentPoolProfile.nodeTaints))
+| extend nodePool = tostring(parse_json(agentPoolProfile.name))
| where isempty(taint)
-| project recommendationid="aks-2", name, id, tags, param1=strcat("taint: ", taint)
+| project recommendationid="aks-2", id, name, tags, param1=strcat("nodepoolName: ", nodePool), param2=strcat("taint: ", iff(isempty(taint), "None", taint))
diff --git a/docs/content/services/container/aks/code/aks-20/aks-20.kql b/docs/content/services/container/aks/code/aks-20/aks-20.kql
index bd72121cc..68f48e8be 100644
--- a/docs/content/services/container/aks/code/aks-20/aks-20.kql
+++ b/docs/content/services/container/aks/code/aks-20/aks-20.kql
@@ -1,8 +1,9 @@
// Azure Resource Graph Query
-// Returns AKS clusters where either Azure Monitor is not enabled and/or Container Insights is not enabled
+// Returns each AKS cluster with nodepools that have system nodepools with less than 2 nodes
resources
-| where type == "microsoft.containerservice/managedclusters"
-| extend azureMonitor = tostring(parse_json(properties.azureMonitorProfile.metrics.enabled))
-| extend insights = tostring(parse_json(properties.addonProfiles.omsagent.enabled))
-| where isempty(azureMonitor) or isempty(insights)
-| project recommendationId="aks-20",id, name, param1=strcat("azureMonitorProfileEnabled: ", iff(isempty(azureMonitor), "false", azureMonitor)), param2=strcat("containerInsightsEnabled: ", iff(isempty(insights), "false", insights))
+| where type == "microsoft.containerservice/managedclusters"
+| mv-expand agentPoolProfile = properties.agentPoolProfiles
+| extend taints = tostring(parse_json(agentPoolProfile.nodeTaints))
+| extend nodePool = tostring(parse_json(agentPoolProfile.name))
+| where taints has "CriticalAddonsOnly=true:NoSchedule" and agentPoolProfile.minCount < 2
+| project recommendationid="aks-20", id, name, param1=strcat("nodePoolName: ", nodePool), param2=strcat("nodePoolMinNodeCount: ", agentPoolProfile.minCount)
diff --git a/docs/content/services/container/aks/code/aks-21/aks-21.kql b/docs/content/services/container/aks/code/aks-21/aks-21.kql
index fa5cad258..2c3fed22a 100644
--- a/docs/content/services/container/aks/code/aks-21/aks-21.kql
+++ b/docs/content/services/container/aks/code/aks-21/aks-21.kql
@@ -1 +1,9 @@
-// cannot-be-validated-with-arg
+// Azure Resource Graph Query
+// Returns each AKS cluster with nodepools that have user nodepools with less than 2 nodes
+resources
+| where type == "microsoft.containerservice/managedclusters"
+| mv-expand agentPoolProfile = properties.agentPoolProfiles
+| extend taints = tostring(parse_json(agentPoolProfile.nodeTaints))
+| extend nodePool = tostring(parse_json(agentPoolProfile.name))
+| where taints !has "CriticalAddonsOnly=true:NoSchedule" and agentPoolProfile.minCount < 2
+| project recommendationid="aks-21", id, name, param1=strcat("nodePoolName: ", nodePool), param2=strcat("nodePoolMinNodeCount: ", agentPoolProfile.minCount)
diff --git a/docs/content/services/container/aks/code/aks-23/aks-23.kql b/docs/content/services/container/aks/code/aks-23/aks-23.kql
index ced1cf25f..e45ed8c75 100644
--- a/docs/content/services/container/aks/code/aks-23/aks-23.kql
+++ b/docs/content/services/container/aks/code/aks-23/aks-23.kql
@@ -1,7 +1,25 @@
// Azure Resource Graph Query
-// Returns AKS clusters where GitOps is not enabled
+// Returns each AKS cluster with nodepools that have user nodepools with a subnetmask that does not match autoscale configured max-nodes
+// Subtracting the network address, broadcast address, and default 3 addresses Azure reserves within each subnet
+
resources
-| where type == "microsoft.containerservice/managedclusters"
-| extend gitops = tostring (parse_json(properties.addOnProfiles.gitops.enabled))
-| where isempty(gitops)
-| project recommendationId="aks-23", id, name, param1=strcat("gitopsEnabled: ", "false")
+| where type == "microsoft.containerservice/managedclusters"
+| extend nodePools = properties['agentPoolProfiles']
+| mv-expand nodePools = properties.agentPoolProfiles
+| where nodePools.enableAutoScaling == true
+| extend nodePoolName=nodePools.name, maxNodes = nodePools.maxCount, subnetId = tostring(nodePools.vnetSubnetID)
+| project clusterId = id, clusterName=name, nodePoolName=nodePools.name, toint(maxNodes), subnetId
+| join kind = leftouter (
+ resources
+ | where type == 'microsoft.network/virtualnetworks'
+ | extend subnets = properties.subnets
+ | mv-expand subnets
+ | project id = tostring(subnets.id), addressPrefix = tostring(subnets.properties['addressPrefix'])
+ | extend subnetmask = toint(substring(addressPrefix, indexof(addressPrefix, '/')+1, string_size(addressPrefix)))
+ | extend possibleMaxNodeCount = toint(exp2(32-subnetmask) - 5)
+) on $left.subnetId == $right.id
+| project-away id, subnetmask
+| where possibleMaxNodeCount <= maxNodes
+| extend param1 = strcat(nodePoolName, " autoscaler upper limit: ", maxNodes)
+| extend param2 = strcat("ip addresses on subnet: ", possibleMaxNodeCount)
+| project recommendationId="aks-23", name=clusterName, id=clusterId, param1, param2
diff --git a/docs/content/services/container/aks/code/aks-6/aks-6.kql b/docs/content/services/container/aks/code/aks-6/aks-6.kql
index fa5cad258..490fbc75b 100644
--- a/docs/content/services/container/aks/code/aks-6/aks-6.kql
+++ b/docs/content/services/container/aks/code/aks-6/aks-6.kql
@@ -1 +1,14 @@
-// cannot-be-validated-with-arg
+// Azure Resource Graph Query
+// Find AKS clusters that do not have backup enabled
+
+resources
+| where type =~ 'Microsoft.ContainerService/managedClusters'
+| extend lname = tolower(name)
+| join kind=leftouter(recoveryservicesresources
+ | where type =~ 'microsoft.dataprotection/backupvaults/backupinstances'
+ | extend lname = tolower(tostring(split(properties.dataSourceInfo.resourceID, '/')[8]))
+ | extend protectionState = properties.currentProtectionState
+ | project lname, protectionState) on lname
+| where protectionState != 'ProtectionConfigured'
+| extend param1 = iif(isnull(protectionState), 'Protection Not Configured', strcat('Protection State: ', protectionState))
+| project recommendationID = "aks-6", name, id, tags, param1
diff --git a/docs/content/services/container/aks/code/aks-7/aks-7.kql b/docs/content/services/container/aks/code/aks-7/aks-7.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/container/aks/code/aks-7/aks-7.kql
+++ b/docs/content/services/container/aks/code/aks-7/aks-7.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/networking/vpn-gateway/code/vpng-3/vpng-3.kql b/docs/content/services/container/aks/code/aks-9/aks-9.fix
similarity index 100%
rename from docs/content/services/networking/vpn-gateway/code/vpng-3/vpng-3.kql
rename to docs/content/services/container/aks/code/aks-9/aks-9.fix
diff --git a/docs/content/services/container/container-registry/_index.md b/docs/content/services/container/container-registry/_index.md
index 3265d525e..9bac03f6b 100644
--- a/docs/content/services/container/container-registry/_index.md
+++ b/docs/content/services/container/container-registry/_index.md
@@ -12,25 +12,22 @@ The presented resiliency recommendations in this guidance include Container Regi
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [CR-1 - Use Premium tier for critical production workloads](#cr-1---use-premium-tier-for-critical-production-workloads) | High | Preview | Yes |
-| [CR-2 - Enable zone redundancy](#cr-2---enable-zone-redundancy) | High | Preview | Yes |
-| [CR-3 - Enable geo-replication](#cr-3---enable-geo-replication) | High | Preview | Yes |
-| [CR-4 - Maximize pull performance](#cr-4---maximize-pull-performance) | High | Preview | No |
-| [CR-5 - Use Repository namespaces](#cr-5---use-repository-namespaces) | Low | Preview | No |
-| [CR-6 - Move Container Registry to a dedicated resource group](#cr-6---move-container-registry-to-a-dedicated-resource-group) | Low | Preview | No |
-| [CR-7 - Manage registry size](#cr-7---manage-registry-size) | Medium | Preview | No |
-| [CR-8 - Disable anonymous pull access](#cr-8---disable-anonymous-pull-access) | Medium | Preview | Yes |
-| [CR-9 - Use an Azure managed identity to authenticate to an Azure container registry](#cr-9---use-an-azure-managed-identity-to-authenticate-to-an-azure-container-registry) | Medium | Preview | No |
-| [CR-10 - Configure Diagnostic Settings for all Azure Container Registries](#cr-10---configure-diagnostic-settings-for-all-azure-container-registries) | Medium | Preview | No |
-| [CR-11 - Monitor Azure Container Registry with Azure Monitor](#cr-11---monitor-azure-container-registry-with-azure-monitor) | Medium | Preview | No |
-| [CR-12 - Enable soft delete policy](#cr-12---enable-soft-delete-policy) | Medium | Preview | Yes
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [CR-1 - Use Premium tier for critical production workloads](#cr-1---use-premium-tier-for-critical-production-workloads) | System Efficiency | High | Preview | Yes |
+| [CR-2 - Enable zone redundancy](#cr-2---enable-zone-redundancy) | Availability | High | Preview | Yes |
+| [CR-3 - Enable geo-replication](#cr-3---enable-geo-replication) | Disaster Recovery | High | Preview | Yes |
+| [CR-5 - Use Repository namespaces](#cr-5---use-repository-namespaces) | Access & Security | Low | Preview | No |
+| [CR-6 - Move Container Registry to a dedicated resource group](#cr-6---move-container-registry-to-a-dedicated-resource-group) | Governance | Low | Preview | Yes |
+| [CR-7 - Manage registry size](#cr-7---manage-registry-size) | System Efficiency | Medium | Preview | No |
+| [CR-8 - Disable anonymous pull access](#cr-8---disable-anonymous-pull-access) | Access & Security | Medium | Preview | Yes |
+| [CR-10 - Configure Diagnostic Settings for all Azure Container Registries](#cr-10---configure-diagnostic-settings-for-all-azure-container-registries) | Monitoring | Medium | Preview | No |
+| [CR-11 - Monitor Azure Container Registry with Azure Monitor](#cr-11---monitor-azure-container-registry-with-azure-monitor) | Monitoring | Medium | Preview | No |
+| [CR-12 - Enable soft delete policy](#cr-12---enable-soft-delete-policy) | Disaster Recovery | Medium | Preview | Yes |
{{< /table >}}
-{{< alert style="info" >}}
+{{< alert style="info" >}}
Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
-
{{< /alert >}}
## Recommendations Details
@@ -49,7 +46,7 @@ Choose a service tier of Azure Container Registry that meets your performance ne
- [Container Registry Best Practices](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-best-practices)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -73,7 +70,7 @@ Azure Container Registry supports optional zone redundancy. Zone redundancy prov
- [Registry best practices - Enable zone redundancy](https://review.learn.microsoft.com/en-us/azure/container-registry/zone-redundancy?toc=%2Fazure%2Freliability%2Ftoc.json&bc=%2Fazure%2Freliability%2Fbreadcrumb%2Ftoc.json&branch=main)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -100,7 +97,7 @@ Geo-replication is available with Premium registries.
- [Registry best practices - Enable geo-replication](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-best-practices#geo-replicate-multi-region-deployments)
- [Geo-Replicate Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-geo-replication)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -110,34 +107,6 @@ Geo-replication is available with Premium registries.
-### CR-4 - Maximize pull performance
-
-**Category: System Efficiency**
-
-**Impact: High**
-
-**Guidance**
-
-Some characteristics of your images themselves can impact pull performance:
-
-- Image size - Minimize the sizes of your images by removing unnecessary layers or reducing the size of layers. One way to reduce image size is to use the multi-stage Docker build approach to include only the necessary runtime components. Also check whether your image can include a lighter base OS image. And if you use a deployment environment such as Azure Container Instances that caches certain base images, check whether you can swap an image layer for one of the cached images.
-
-- Number of layers - Balance the number of layers used. If you have too few, you don’t benefit from layer reuse and caching on the host. Too many, and your deployment environment spends more time pulling and decompressing. Five to 10 layers is optimal.
-
-**Resources**
-
-- [Registry authentication options - Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli#admin-account)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/cr-4/cr-4.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
### CR-5 - Use Repository namespaces
**Category: Access & Security**
@@ -152,7 +121,7 @@ By using repository namespaces, you can allow sharing a single registry across m
- [Registry best practices - use repository namespaces](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-best-practices#repository-namespaces)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -177,7 +146,7 @@ Although you might experiment with a specific host type, such as Azure Container
- [Registry best practices - Use dedicated resource group](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-best-practices#dedicated-resource-group)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -202,7 +171,7 @@ The storage constraints of each container registry service tier are intended to
- [Registry best practices - Manage registry size](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-best-practices#manage-registry-size)
- [Retention Policy](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-retention-policy#about-the-retention-policy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -226,7 +195,7 @@ By default, access to pull or push content from an Azure container registry is o
- [Enable anonymous pull access](https://learn.microsoft.com/en-us/azure/container-registry/anonymous-pull-access#about-anonymous-pull-access)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -236,33 +205,6 @@ By default, access to pull or push content from an Azure container registry is o
-### CR-9 - Use an Azure managed identity to authenticate to an Azure container registry
-
-**Category: Access & Security**
-
-**Impact: Medium**
-
-**Guidance**
-
-Each container registry includes an admin user account, which is disabled by default. The admin account is designed for a single user to access the registry, mainly for testing purposes. We do not recommend sharing the admin account credentials among multiple users. All users authenticating with the admin account appear as a single user with push and pull access to the registry. Changing or disabling this account disables registry access for all users who use its credentials.
-
-Use a managed identity for Azure resources to authenticate to an Azure container registry from another Azure resource, without needing to provide or manage registry credentials.
-
-**Resources**
-
-- [Registry authentication options - Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli#admin-account)
-- [Authenticate with managed identity - Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication-managed-identity?tabs=azure-cli)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/cr-9/cr-9.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
### CR-10 - Configure Diagnostic Settings for all Azure Container Registries
**Category: Monitoring**
@@ -278,7 +220,7 @@ Resource Logs are not collected and stored until you create a diagnostic setting
- [Monitoring Azure Container Registry data reference - Resource Logs](https://learn.microsoft.com/en-us/azure/container-registry/monitor-service-reference#resource-logs)
- [Monitor Azure Container Registry - Enable diagnostic logs](https://learn.microsoft.com/en-us/azure/container-registry/monitor-service#collection-and-routing)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -302,7 +244,7 @@ When you have critical applications and business processes relying on Azure reso
- [Monitoring Azure Container Registry data reference](https://learn.microsoft.com/en-us/azure/container-registry/monitor-service-reference#metrics)
- [Monitor Azure Container Registry](https://learn.microsoft.com/en-us/azure/container-registry/monitor-service)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -326,7 +268,7 @@ Once you enable the soft delete policy, ACR manages the deleted artifacts as the
- [Enable soft delete policy](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-soft-delete-policy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/container/container-registry/code/cr-1/cr-1.kql b/docs/content/services/container/container-registry/code/cr-1/cr-1.kql
index e49b69449..d7f1a08d8 100644
--- a/docs/content/services/container/container-registry/code/cr-1/cr-1.kql
+++ b/docs/content/services/container/container-registry/code/cr-1/cr-1.kql
@@ -3,5 +3,5 @@
resources
| where type =~ "microsoft.containerregistry/registries"
| where sku.name != "Premium"
-| project recommendationId = "cr-1", name, id, param1=strcat("SkuName: ", tostring(sku.name))
+| project recommendationId = "cr-1", name, id, tags, param1=strcat("SkuName: ", tostring(sku.name))
| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-12/cr-12.kql b/docs/content/services/container/container-registry/code/cr-12/cr-12.kql
index f16d8ec01..a1478d5c1 100644
--- a/docs/content/services/container/container-registry/code/cr-12/cr-12.kql
+++ b/docs/content/services/container/container-registry/code/cr-12/cr-12.kql
@@ -3,5 +3,5 @@
resources
| where type =~ "microsoft.containerregistry/registries"
| where properties.policies.softDeletePolicy.status == "disabled"
-| project recommendationId = "cr-12", name, id
+| project recommendationId = "cr-12", name, id, tags
| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-2/cr-2.kql b/docs/content/services/container/container-registry/code/cr-2/cr-2.kql
index 79880213b..4feef2c67 100644
--- a/docs/content/services/container/container-registry/code/cr-2/cr-2.kql
+++ b/docs/content/services/container/container-registry/code/cr-2/cr-2.kql
@@ -3,5 +3,5 @@
resources
| where type =~ "microsoft.containerregistry/registries"
| where properties.zoneRedundancy != "Enabled"
-| project recommendationId = "cr-2", name, id, param1=strcat("zoneRedundancy: ", tostring(properties.zoneRedundancy))
+| project recommendationId = "cr-2", name, id, tags, param1=strcat("zoneRedundancy: ", tostring(properties.zoneRedundancy))
| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-3/cr-3.kql b/docs/content/services/container/container-registry/code/cr-3/cr-3.kql
index 842219044..19983dd59 100644
--- a/docs/content/services/container/container-registry/code/cr-3/cr-3.kql
+++ b/docs/content/services/container/container-registry/code/cr-3/cr-3.kql
@@ -2,7 +2,7 @@
// Find all Container Registries that do not have geo-replication enabled
resources
| where type =~ "microsoft.containerregistry/registries"
-| project registryName = name, registryId = id, primaryRegion = location
+| project registryName = name, registryId = id, tags, primaryRegion = location
| join kind=leftouter (
Resources
| where type =~ "microsoft.containerregistry/registries/replications"
@@ -11,5 +11,5 @@ resources
) on registryId
| project-away registryId1, replicationId
| where isempty(replicationRegion)
-| project recommendationId = "cr-3", name=registryName, id=registryId
+| project recommendationId = "cr-3", name=registryName, id=registryId, tags
| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-6/cr-6.kql b/docs/content/services/container/container-registry/code/cr-6/cr-6.kql
index 614a7f9ca..0f2e4851b 100644
--- a/docs/content/services/container/container-registry/code/cr-6/cr-6.kql
+++ b/docs/content/services/container/container-registry/code/cr-6/cr-6.kql
@@ -1 +1,12 @@
-// under-development
+// Azure Resource Graph Query
+// List container registries that contain additional resources within the same resource group.
+resources
+| where type =~ "microsoft.containerregistry/registries"
+| project registryName=name, registryId=id, registryTags=tags, resourceGroupId=strcat('/subscriptions/', subscriptionId, '/resourceGroups/', resourceGroup), resourceGroup, subscriptionId
+| join kind=inner (
+ resources
+ | where not(type =~ "microsoft.containerregistry/registries")
+ | summarize recourceCount=count() by subscriptionId, resourceGroup
+ | where recourceCount != 0
+) on resourceGroup, subscriptionId
+| project recommendationId = "cr-6", name=registryName, id=registryId, tags=registryTags, param1=strcat('resourceGroupName:',resourceGroup), param2=strcat('resourceGroupId:',resourceGroupId)
diff --git a/docs/content/services/container/container-registry/code/cr-6/cr-6.kql.fix b/docs/content/services/container/container-registry/code/cr-6/cr-6.kql.fix
deleted file mode 100644
index 29e16300a..000000000
--- a/docs/content/services/container/container-registry/code/cr-6/cr-6.kql.fix
+++ /dev/null
@@ -1,8 +0,0 @@
-// Azure Resource Graph Query
-// Lists resource groups that contain resources with type microsoft.containerregistry/registries, and provides a list of all other types of resources within the same resource group.
-resources
-| project resourceGroup, resourceType = type
-| summarize resourceTypes = make_set(resourceType) by resourceGroup
-| where array_index_of(resourceTypes, "microsoft.containerregistry/registries") != -1
-| project recommendationId = "cr-6", name=strcat("resourceGroup: ",resourceGroup), id="", resourceTypes
-| order by name asc
diff --git a/docs/content/services/container/container-registry/code/cr-7/cr-7.kql b/docs/content/services/container/container-registry/code/cr-7/cr-7.kql
index 614a7f9ca..00a505d34 100644
--- a/docs/content/services/container/container-registry/code/cr-7/cr-7.kql
+++ b/docs/content/services/container/container-registry/code/cr-7/cr-7.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph Query
+// Find all Container Registries that have their retention policy disabled
+resources
+| where type =~ "microsoft.containerregistry/registries"
+| where properties.policies.retentionPolicy.status == "disabled"
+| project recommendationId = "cr-7", name, id, tags, param1='retentionPolicy:disabled'
+| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-7/cr-7.kql.fix b/docs/content/services/container/container-registry/code/cr-7/cr-7.kql.fix
deleted file mode 100644
index 8874bc992..000000000
--- a/docs/content/services/container/container-registry/code/cr-7/cr-7.kql.fix
+++ /dev/null
@@ -1,7 +0,0 @@
-// Azure Resource Graph Query
-// Find all Container Registries that have their retention policy disabled
-resources
-| where type =~ "microsoft.containerregistry/registries"
-| where properties.policies.retentionPolicy.status == "disabled"
-| project recommendationId = "cr-7", name, id
-| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-8/cr-8.kql b/docs/content/services/container/container-registry/code/cr-8/cr-8.kql
index 7752ec482..1af829e17 100644
--- a/docs/content/services/container/container-registry/code/cr-8/cr-8.kql
+++ b/docs/content/services/container/container-registry/code/cr-8/cr-8.kql
@@ -3,5 +3,5 @@
resources
| where type =~ "microsoft.containerregistry/registries"
| where properties.anonymousPullEnabled == "true"
-| project recommendationId = "cr-8", name, id
+| project recommendationId = "cr-8", name, id, tags
| order by id asc
diff --git a/docs/content/services/container/container-registry/code/cr-9/cr-9.kql.fix b/docs/content/services/container/container-registry/code/cr-9/cr-9.kql.fix
deleted file mode 100644
index 64c8673a6..000000000
--- a/docs/content/services/container/container-registry/code/cr-9/cr-9.kql.fix
+++ /dev/null
@@ -1,7 +0,0 @@
-// Azure Resource Graph Query
-// Find all Container Registries that have the admin user enabled
-resources
-| where type =~ "microsoft.containerregistry/registries"
-| where properties.adminUserEnabled == "true"
-| project recommendationId = "cr-9", name, id
-| order by id asc
diff --git a/docs/content/services/database/cosmosdb/_index.md b/docs/content/services/database/cosmosdb/_index.md
index 7c9262152..daccfcc02 100644
--- a/docs/content/services/database/cosmosdb/_index.md
+++ b/docs/content/services/database/cosmosdb/_index.md
@@ -1,7 +1,7 @@
+++
title = "Cosmos DB"
description = "Best practices and resiliency recommendations for Cosmos DB and associated resources and settings."
-date = "6/30/23"
+date = "3/5/2024"
author = "kovarikthomas"
msAuthor = "tokovari"
draft = false
@@ -12,17 +12,17 @@ The presented resiliency recommendations in this guidance include Cosmos DB and
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :-----------------: |
-| [COSMOS-1 – Configure at least two regions for high availability](#cosmos-1---configure-at-least-two-regions-for-high-availability) | High | Preview | Yes |
-| [COSMOS-2 – Enable service-managed failover for multi-region accounts with single write region](#cosmos-2---enable-service-managed-failover-for-multi-region-accounts-with-single-write-region) | High | Preview | No |
-| [COSMOS-3 – Evaluate multi-region write capability](#cosmos-3---evaluate-multi-region-write-capability) | High | Preview | Yes |
-| [COSMOS-4 – Choose appropriate consistency mode reflecting data durability requirements](#cosmos-4---choose-appropriate-consistency-mode-reflecting-data-durability-requirements) | High | Preview | No |
-| [COSMOS-5 – Configure continuous backup mode](#cosmos-5---configure-continuous-backup-mode) | High | Preview | Yes |
-| [COSMOS-6 – Ensure query results are fully drained](#cosmos-6---ensure-query-results-are-fully-drained) | High | Preview | No |
-| [COSMOS-7 – Maintain singleton pattern in your client](#cosmos-7---maintain-singleton-pattern-in-your-client) | Medium | Preview | No |
-| [COSMOS-8 – Implement retry logic in your client](#cosmos-8---implement-retry-logic-in-your-client) | Medium | Preview | No |
-| [COSMOS-9 – Monitor Cosmos DB health and set up alerts](#cosmos-9---monitor-cosmos-db-health-and-set-up-alerts) | Medium | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------:|:------:|:-------:|:-------------------:|
+| [COSMOS-1 - Configure at least two regions for high availability](#cosmos-1---configure-at-least-two-regions-for-high-availability) | Availability | High | Verified | Yes |
+| [COSMOS-2 - Enable service-managed failover for multi-region accounts with single write region](#cosmos-2---enable-service-managed-failover-for-multi-region-accounts-with-single-write-region) | Disaster Recovery | High | Verified | Yes |
+| [COSMOS-3 - Evaluate multi-region write capability](#cosmos-3---evaluate-multi-region-write-capability) | Disaster Recovery | High | Verified | Yes |
+| [COSMOS-4 - Choose appropriate consistency mode reflecting data durability requirements](#cosmos-4---choose-appropriate-consistency-mode-reflecting-data-durability-requirements) | Disaster Recovery | High | Preview | No |
+| [COSMOS-5 - Configure continuous backup mode](#cosmos-5---configure-continuous-backup-mode) | Disaster Recovery | High | Verified | Yes |
+| [COSMOS-6 - Ensure query results are fully drained](#cosmos-6---ensure-query-results-are-fully-drained) | System Efficiency | High | Verified | No |
+| [COSMOS-7 - Maintain singleton pattern in your client](#cosmos-7---maintain-singleton-pattern-in-your-client) | System Efficiency | Medium | Verified | No |
+| [COSMOS-8 - Implement retry logic in your client](#cosmos-8---implement-retry-logic-in-your-client) | Application Resilience | Medium | Verified | No |
+| [COSMOS-9 - Monitor Cosmos DB health and set up alerts](#cosmos-9---monitor-cosmos-db-health-and-set-up-alerts) | Monitoring | Medium | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -41,13 +41,14 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
**Guidance**
-Azure implements multi-tier isolation approach with rack, DC, zone, and region isolation levels. Cosmos DB is by default highly resilient by running four replicas, but it is still susceptible to failures or issues with entire regions or availability zones. As such, it is crucial to enable at least a secondary region on your Cosmos DB to achieve higher SLA. Doing so does not incur any downtime at all and it is as easy as selecting a pin on map.
+Azure implements multi-tier isolation approach with rack, DC, zone, and region isolation levels. Cosmos DB is by default highly resilient by running four replicas, but it is still susceptible to failures or issues with entire regions or availability zones. As such, it is crucial to enable at least a secondary region on your Cosmos DB to achieve higher SLA. Doing so does not incur any downtime at all and it is as easy as selecting a pin on map. Cosmos DB instances utilizing Strong consistency need to configure at least three regions to retain write availability in case of one region failure.
**Resources**
- [Distribute data globally with Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/distribute-data-globally)
+- [Tips for building highly available applications | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/high-availability#tips-for-building-highly-available-applications)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -71,7 +72,7 @@ Cosmos DB is a battle-tested service with extremely high uptime and resiliency,
- [Manage an Azure Cosmos DB account by using the Azure portal | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-manage-database-account#automatic-failover)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -96,7 +97,7 @@ Multi-region write capability enables you to design multi-region application tha
- [Distribute data globally with Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/distribute-data-globally)
- [Conflict resolution types and resolution policies in Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/conflict-resolution-policies)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -120,7 +121,7 @@ Within a globally distributed database environment, there is a direct relationsh
- [Consistency level choices - Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -144,7 +145,7 @@ Cosmos DB automatically backs up your data and there is no way to turn back ups
- [Continuous backup with point in time restore feature in Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/continuous-backup-restore-introduction)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -168,7 +169,7 @@ Cosmos DB limits single response to 4 MB. If your query requests a large amount
- [Pagination in Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/pagination#handling-multiple-pages-of-results)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -191,7 +192,7 @@ Not only is establishing a new database connection expensive, so is maintaining
**Resources**
- [Designing resilient applications with Azure Cosmos DB SDKs | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/conceptual-resilient-sdk-applications)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -215,7 +216,7 @@ Cosmos DB SDKs by default handle large number of transient errors and automatica
- [Designing resilient applications with Azure Cosmos DB SDKs | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/conceptual-resilient-sdk-applications)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -239,7 +240,7 @@ It is good practice to monitor the availability and responsiveness of your Azure
- [Create alerts for Azure Cosmos DB using Azure Monitor | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/create-alerts)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/database/cosmosdb/code/cosmos-1/cosmos-1.kql b/docs/content/services/database/cosmosdb/code/cosmos-1/cosmos-1.kql
index b30399b3e..6daab67ba 100644
--- a/docs/content/services/database/cosmosdb/code/cosmos-1/cosmos-1.kql
+++ b/docs/content/services/database/cosmosdb/code/cosmos-1/cosmos-1.kql
@@ -1,4 +1,6 @@
Resources
| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
-| where array_length(properties.locations) < 2
-| project recommendationId='cosmos-1', name, id
+| where
+ array_length(properties.locations) < 2 or
+ (array_length(properties.locations) < 3 and properties.consistencyPolicy.defaultConsistencyLevel == 'Strong')
+| project recommendationId='cosmos-1', name, id, tags
diff --git a/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql b/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql
index 614a7f9ca..06079e38d 100644
--- a/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql
+++ b/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql
@@ -1 +1,7 @@
-// under-development
+Resources
+| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
+| where
+ array_length(properties.locations) > 1 and
+ tobool(properties.enableAutomaticFailover) == false and
+ tobool(properties.enableMultipleWriteLocations) == false
+| project recommendationId='cosmos-2', name, id, tags
diff --git a/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql.fix b/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql.fix
deleted file mode 100644
index 9f25dcf7c..000000000
--- a/docs/content/services/database/cosmosdb/code/cosmos-2/cosmos-2.kql.fix
+++ /dev/null
@@ -1,6 +0,0 @@
-resources
-| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
-| where
- array_length(properties.locations) > 1 and
- properties.enableAutomaticFailover == false
-| project recommendationId='cosmos-2', name, id
diff --git a/docs/content/services/database/cosmosdb/code/cosmos-3/cosmos-3.kql b/docs/content/services/database/cosmosdb/code/cosmos-3/cosmos-3.kql
index c1ec6aa9f..fb7c6d8bc 100644
--- a/docs/content/services/database/cosmosdb/code/cosmos-3/cosmos-3.kql
+++ b/docs/content/services/database/cosmosdb/code/cosmos-3/cosmos-3.kql
@@ -3,4 +3,4 @@ Resources
| where
array_length(properties.locations) > 1 and
properties.enableMultipleWriteLocations == false
-| project recommendationId='cosmos-3', name, id
+| project recommendationId='cosmos-3', name, id, tags
diff --git a/docs/content/services/database/cosmosdb/code/cosmos-5/cosmos-5.kql b/docs/content/services/database/cosmosdb/code/cosmos-5/cosmos-5.kql
index b87eec27f..7ef84ed68 100644
--- a/docs/content/services/database/cosmosdb/code/cosmos-5/cosmos-5.kql
+++ b/docs/content/services/database/cosmosdb/code/cosmos-5/cosmos-5.kql
@@ -4,4 +4,4 @@ Resources
properties.backupPolicy.type == 'Periodic' and
properties.enableMultipleWriteLocations == false and
properties.enableAnalyticalStorage == false
-| project recommendationId='cosmos-5', name, id
+| project recommendationId='cosmos-5', name, id, tags
diff --git a/docs/content/services/database/db-for-mysql/_index.md b/docs/content/services/database/db-for-mysql/_index.md
new file mode 100644
index 000000000..ba9843d1f
--- /dev/null
+++ b/docs/content/services/database/db-for-mysql/_index.md
@@ -0,0 +1,75 @@
++++
+title = "DB for MySQL"
+description = "Best practices and resiliency recommendations for Db for Mysql and associated resources and settings."
+date = "2/26/24"
+author = "ejhenry"
+msAuthor = "erhenry"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include DB for MySQL and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------|:-----------------------------------------------------------------------:|:---------------:|:----------------:|:-------------------:|
+| [MYSQL-1 - Enable HA with zone redundancy](#mysql-1---enable-ha-with-zone-redundancy) | Availability | High | Verified | Yes |
+| [MYSQL-2 - Enable custom maintenance schedule](#mysql-2---enable-custom-maintenance-schedule) | System Efficiency | High | Verified | Yes |
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### MYSQL-1 - Enable HA with zone redundancy
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Enable HA with zone redundancy on flexible server instances. Zone redundant high availability deploys a standby replica in a different zone with automatic failover capability.
+
+**Resources**
+
+- [High availability concepts in Azure Database for MySQL - Flexible Server](https://learn.microsoft.com/azure/mysql/flexible-server/concepts-high-availability)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/mysql-1/mysql-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### MYSQL-2 - Enable custom maintenance schedule
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Guidance**
+
+Use custom maintenance schedule on flexible server instances to select a preferred time for service updates to be applied.
+
+**Resources**
+
+- [Scheduled maintenance in Azure Database for MySQL - Flexible Server](https://learn.microsoft.com/azure/mysql/flexible-server/concepts-maintenance)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/mysql-2/mysql-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/database/db-for-mysql/code/mysql-1/mysql-1.kql b/docs/content/services/database/db-for-mysql/code/mysql-1/mysql-1.kql
new file mode 100644
index 000000000..4bebbdace
--- /dev/null
+++ b/docs/content/services/database/db-for-mysql/code/mysql-1/mysql-1.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Find Database for MySQL instances that are not zone redundant
+resources
+| where type == "microsoft.dbformysql/flexibleservers"
+| where properties.highAvailability.mode != "ZoneRedundant"
+| project recommendationId = "psql-1", name, id, tags, param1 = "ZoneRedundant: False"
diff --git a/docs/content/services/database/db-for-mysql/code/mysql-2/mysql-2.kql b/docs/content/services/database/db-for-mysql/code/mysql-2/mysql-2.kql
new file mode 100644
index 000000000..c19cc6486
--- /dev/null
+++ b/docs/content/services/database/db-for-mysql/code/mysql-2/mysql-2.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Find Database for MySQL instances that do not have a custom maintenance window
+resources
+| where type == "microsoft.dbformysql/flexibleservers"
+| where properties.maintenanceWindow.customWindow != "Enabled"
+| project recommendationId = "psql-2", name, id, tags, param1 = strcat("customWindow:", properties['maintenanceWindow']['customWindow'])
diff --git a/docs/content/services/database/db-for-postgresql/_index.md b/docs/content/services/database/db-for-postgresql/_index.md
index ab3908edf..217cc0beb 100644
--- a/docs/content/services/database/db-for-postgresql/_index.md
+++ b/docs/content/services/database/db-for-postgresql/_index.md
@@ -14,7 +14,8 @@ The presented resiliency recommendations in this guidance include Database for P
{{< table style="table-striped" >}}
| Recommendation | Category | Impact | State | ARG Query Available |
| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
-| [PSQL-1 - Enable HA with zone redundancy](#psql-1---enable-ha-with-zone-redundancy) | High Availability | High | Preview | Yes |
+| [PSQL-1 - Enable HA with zone redundancy](#psql-1---enable-ha-with-zone-redundancy) | Availability | High | Verified | Yes |
+| [PSQL-2 - Enable custom maintenance schedule](#psql-1---enable-ha-with-zone-redundancy) | System Efficiency | High | Verified | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -48,3 +49,27 @@ Enable HA with zone redundancy on flexible server instances. Zone redundant high
{{< /collapse >}}
+
+### PSQL-2 - Enable custom maintenance schedule
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Recommendation**
+
+Use custom maintenance schedule on flexible server instances to select a preferred time for service updates to be applied.
+
+**Resources**
+
+- [Scheduled maintenance in Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/azure/postgresql/flexible-server/concepts-maintenance)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/psql-2/psql-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.azcli b/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.azcli
deleted file mode 100644
index 3e449c7e1..000000000
--- a/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-:: under-development
diff --git a/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.kql b/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.kql
index a57b62857..522ad55e4 100644
--- a/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.kql
+++ b/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.kql
@@ -3,4 +3,4 @@
resources
| where type == "microsoft.dbforpostgresql/flexibleservers"
| where properties.highAvailability.mode != "ZoneRedundant"
-| project recommendationId = "psql-1", name, id, param1 = "ZoneRedundant: False"
+| project recommendationId = "psql-1", name, id, tags, param1 = "ZoneRedundant: False"
diff --git a/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.ps1 b/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.ps1
deleted file mode 100644
index 133b22465..000000000
--- a/docs/content/services/database/db-for-postgresql/code/psql-1/psql-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-# under-development
diff --git a/docs/content/services/database/db-for-postgresql/code/psql-2/psql-2.kql b/docs/content/services/database/db-for-postgresql/code/psql-2/psql-2.kql
new file mode 100644
index 000000000..af7d6e089
--- /dev/null
+++ b/docs/content/services/database/db-for-postgresql/code/psql-2/psql-2.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Find Database for PostgreSQL instances that do not have a custom maintenance window
+resources
+| where type == "microsoft.dbforpostgresql/flexibleservers"
+| where properties.maintenanceWindow.customWindow != "Enabled"
+| project recommendationId = "psql-2", name, id, tags, param1 = strcat("customWindow:", properties['maintenanceWindow']['customWindow'])
diff --git a/docs/content/services/database/redis-cache/_index.md b/docs/content/services/database/redis-cache/_index.md
index 72ff70664..d194225b6 100644
--- a/docs/content/services/database/redis-cache/_index.md
+++ b/docs/content/services/database/redis-cache/_index.md
@@ -39,7 +39,7 @@ Azure Cache for Redis supports zone redundancy in its Premium and Enterprise tie
- [Enable zone redundancy for Azure Cache for Redis](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-how-to-zone-redundancy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/database/redis-cache/code/redis-1/redis-1.azcli b/docs/content/services/database/redis-cache/code/redis-1/redis-1.azcli
deleted file mode 100644
index 3e449c7e1..000000000
--- a/docs/content/services/database/redis-cache/code/redis-1/redis-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-:: under-development
diff --git a/docs/content/services/database/redis-cache/code/redis-1/redis-1.kql b/docs/content/services/database/redis-cache/code/redis-1/redis-1.kql
index 171f317e9..d34189ccf 100644
--- a/docs/content/services/database/redis-cache/code/redis-1/redis-1.kql
+++ b/docs/content/services/database/redis-cache/code/redis-1/redis-1.kql
@@ -3,5 +3,5 @@
resources
| where type == "microsoft.cache/redis"
| where array_length(zones) <= 1 or isnull(zones)
-| project recommendationId = "redis-1", name, id, param1 = "AvailabilityZones: Single Zone"
+| project recommendationId = "redis-1", name, id, tags, param1 = "AvailabilityZones: Single Zone"
| order by id asc
diff --git a/docs/content/services/database/redis-cache/code/redis-1/redis-1.ps1 b/docs/content/services/database/redis-cache/code/redis-1/redis-1.ps1
deleted file mode 100644
index 133b22465..000000000
--- a/docs/content/services/database/redis-cache/code/redis-1/redis-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-# under-development
diff --git a/docs/content/services/database/sqldb/_index.md b/docs/content/services/database/sqldb/_index.md
index 467d1e2c5..a50c5401c 100644
--- a/docs/content/services/database/sqldb/_index.md
+++ b/docs/content/services/database/sqldb/_index.md
@@ -13,14 +13,14 @@ The presented resiliency recommendations in this guidance include Azure Database
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :----: | :-----: | :-----------------: |
-| [SQLDB-1 - Use Active Geo Replication to Create a Readable Secondary in Another Region](#sqldb-1---use-active-geo-replication-to-create-a-readable-secondary-in-another-region) | High | Preview | No |
-| [SQLDB-2 - Use Auto Failover Groups that can include one or multiple databases, typically used by the same application](#sqldb-2---use-auto-failover-groups-that-can-include-one-or-multiple-databases-typically-used-by-the-same-application) | High | Preview | No |
-| [SQLDB-3 - Use a Zone-Redundant database](#sqldb-3---use-a-zone-redundant-database) | Medium | Preview | Yes |
-| [SQLDB-4 - Implement Retry Logic](#sqldb-4---implement-retry-logic) | High | Preview | No |
-| [SQLDB-5 - Monitor your Azure SQL Database in near-real time to detect reliability incidents](#sqldb-5---monitor-your-azure-sql-database-in-near-real-time-to-detect-reliability-incidents) | High | Preview | No |
-| [SQLDB-6 - Back up your keys](#sqldb-6---back-up-your-keys) | Medium | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------:|:------:|:-------:|:-------------------:|
+| [SQLDB-1 - Use Active Geo Replication to Create a Readable Secondary in Another Region](#sqldb-1---use-active-geo-replication-to-create-a-readable-secondary-in-another-region) | Disaster Recovery | High | Preview | Yes |
+| [SQLDB-2 - Use Auto Failover Groups that can include one or multiple databases, typically used by the same application](#sqldb-2---use-auto-failover-groups-that-can-include-one-or-multiple-databases-typically-used-by-the-same-application) | Disaster Recovery | High | Preview | Yes |
+| [SQLDB-3 - Use a Zone-Redundant database](#sqldb-3---use-a-zone-redundant-database) | Availability | Medium | Preview | Yes |
+| [SQLDB-4 - Implement Retry Logic](#sqldb-4---implement-retry-logic) | Application Resilience | High | Preview | Yes |
+| [SQLDB-5 - Monitor your Azure SQL Database in near-real time to detect reliability incidents](#sqldb-5---monitor-your-azure-sql-database-in-near-real-time-to-detect-reliability-incidents) | Monitoring | High | Preview | Yes |
+| [SQLDB-6 - Back up your keys](#sqldb-6---back-up-your-keys) | Disaster Recovery | Medium | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -45,7 +45,7 @@ If your primary database fails, perform a manual failover to the secondary datab
- [Active Geo Replication](https://learn.microsoft.com/en-us/azure/azure-sql/database/active-geo-replication-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -70,7 +70,7 @@ You can use the readable secondary databases to offload read-only query workload
- [AutoFailover Groups](https://learn.microsoft.com/en-us/azure/azure-sql/database/auto-failover-group-overview?tabs=azure-powershell)
- [DR Design](https://learn.microsoft.com/en-us/azure/azure-sql/database/designing-cloud-solutions-for-disaster-recovery)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -94,7 +94,7 @@ By default, the cluster of nodes for the premium availability model is created i
-[Zone Redundant Databases](https://learn.microsoft.com/en-us/azure/azure-sql/database/high-availability-sla)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -118,7 +118,7 @@ Although Azure SQL Database is resilient when it concerns transitive infrastruct
- [How to Implement Retry Logic](https://learn.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-connectivity-issues)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -132,7 +132,7 @@ Although Azure SQL Database is resilient when it concerns transitive infrastruct
**Category: Monitoring**
-**Impact: Medium**
+**Impact: High**
**Guidance**
@@ -144,7 +144,7 @@ Use one of the available solutions to monitor SQL DB to detect potential reliabi
- [Azure SQL Database Monitoring](https://learn.microsoft.com/en-us/azure/azure-sql/database/monitoring-sql-database-azure-monitor)
- [Monitoring SQL Database Reference](https://learn.microsoft.com/en-us/azure/azure-sql/database/monitoring-sql-database-azure-monitor-reference)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -169,7 +169,7 @@ It is highly recommended to use Azure Key Vault (AKV) to store encryption keys r
- [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/general/overview)
- [Getting Started with Always Encrypted](https://learn.microsoft.com/en-us/azure/azure-sql/database/always-encrypted-landing?view=azuresql)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql b/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql
index 614a7f9ca..2791c6848 100644
--- a/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql
+++ b/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Provides a list of SQL databases that are not configured for Geo-redundant storage.
+resources
+| where type == "microsoft.sql/servers/databases"
+| where (properties['currentBackupStorageRedundancy'] ) <> 'Geo'
+| project recommendationId = "sqldb-1", name, id, tags, param1=strcat("CurrentGeoRedudancy=", properties['currentBackupStorageRedundancy'] )
diff --git a/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql.fix b/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql.fix
deleted file mode 100644
index 53f02d01f..000000000
--- a/docs/content/services/database/sqldb/code/sqldb-1/sqldb-1.kql.fix
+++ /dev/null
@@ -1,4 +0,0 @@
-resources
-| where type == "microsoft.sql/servers/databases"
-| where (properties['currentBackupStorageRedundancy'] ) <> 'Geo'
-| project "recommendationId=sqldb-1", id, name, param1=strcat("CurrentGeoRedudancy:", properties['currentBackupStorageRedundancy'] )
diff --git a/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql b/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql
index 614a7f9ca..94e593d00 100644
--- a/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql
+++ b/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Provides a list of SQL databases that are not configured to use a failover-group.
+resources
+| where type =~'microsoft.sql/servers/databases'
+| where isnull(properties['failoverGroupId'])
+| project recommendationId = "sqldb-2", name, id, tags, param1= strcat("databaseId=", properties['databaseId'])
diff --git a/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql.fix b/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql.fix
deleted file mode 100644
index 9fd70c220..000000000
--- a/docs/content/services/database/sqldb/code/sqldb-2/sqldb-2.kql.fix
+++ /dev/null
@@ -1,5 +0,0 @@
-resources
-| where type =~'microsoft.sql/servers/databases'
-| where isnull(properties['failoverGroupId'])
-| project "recommendationid=SQLDB-2",id, name, param1= strcat("databaseid:", properties['databaseId'])
-
diff --git a/docs/content/services/database/sqldb/code/sqldb-3/sqldb-3.kql b/docs/content/services/database/sqldb/code/sqldb-3/sqldb-3.kql
index 4fd3e1c73..8f377e972 100644
--- a/docs/content/services/database/sqldb/code/sqldb-3/sqldb-3.kql
+++ b/docs/content/services/database/sqldb/code/sqldb-3/sqldb-3.kql
@@ -1,5 +1,5 @@
Resources
| where type =~ 'microsoft.sql/servers/databases'
| where tolower(tostring(properties.zoneRedundant))=~'false'
-|project recommendationId = "SQLDB-3", name, id
+|project recommendationId = "sqldb-3", name, id, tags
diff --git a/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql b/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql
index 614a7f9ca..52d9070d9 100644
--- a/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql
+++ b/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql
@@ -1 +1,16 @@
-// under-development
+// Azure Resource Graph Query
+// Provides a list of SQL databases that are not configured for monitoring.
+resources
+| where type == "microsoft.insights/metricalerts"
+| mv-expand properties.scopes
+| mv-expand properties.criteria.allOf
+| project databaseid = properties_scopes, monitoredMetric = properties_criteria_allOf.metricName
+| where databaseid contains 'databases'
+| summarize monitoredMetrics=make_list(monitoredMetric) by tostring(databaseid)
+| join kind=fullouter (
+ resources
+ | where type =~ 'microsoft.sql/servers/databases'
+ | project databaseid = tolower(id), name, tags
+) on databaseid
+|where isnull(monitoredMetrics)
+|project recommendationId = "sqldb-5", name, id=databaseid1, tags, param1=strcat("MonitoringMetrics=false" )
diff --git a/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql.fix b/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql.fix
deleted file mode 100644
index 16cfa0d9f..000000000
--- a/docs/content/services/database/sqldb/code/sqldb-5/sqldb-5.kql.fix
+++ /dev/null
@@ -1,14 +0,0 @@
-resources
-| where type == "microsoft.insights/metricalerts"
-| mv-expand properties.scopes
-| mv-expand properties.criteria.allOf
-| project databaseid = properties_scopes, monitoredMetric = properties_criteria_allOf.metricName
-| where databaseid contains 'databases'
-| summarize monitoredMetrics=make_list(monitoredMetric) by tostring(databaseid)
-| join kind=fullouter (
- resources
- | where type =~ 'microsoft.sql/servers/databases'
- | project databaseid = tolower(id), name
-) on databaseid
-|where isnull(monitoredMetrics)
-|project recommendationId="SQLDB-5", id=databaseid1, name
diff --git a/docs/content/services/integration/api-management/_index.md b/docs/content/services/integration/api-management/_index.md
index aa43ee2c5..382f11b1d 100644
--- a/docs/content/services/integration/api-management/_index.md
+++ b/docs/content/services/integration/api-management/_index.md
@@ -12,10 +12,11 @@ The presented resiliency recommendations in this guidance include Api Management
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [APIM-1 - Migrate API Management services to Premium SKU to support Availability Zones](#apim-1---migrate-api-management-services-to-premium-sku-to-support-availability-zones) | High | Preview | Yes |
-| [APIM-2 - Enable Availability Zones on Premium API Management instances](#apim-2---enable-availability-zones-on-premium-api-management-instances) | High | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------:|:------:|:-------:|:-------------------:|
+| [APIM-1 - Migrate API Management services to Premium SKU to support Availability Zones](#apim-1---migrate-api-management-services-to-premium-sku-to-support-availability-zones) | Availability | High | Preview | Yes |
+| [APIM-2 - Enable Availability Zones on Premium API Management instances](#apim-2---enable-availability-zones-on-premium-api-management-instances) | Availability | High | Preview | Yes |
+| [APIM-3 - Upgrade to platform version stv2](#apim-3---upgrade-to-platform-version-stv2) | Availability | High | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -28,6 +29,8 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### APIM-1 - Migrate API Management services to Premium SKU to support Availability Zones
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -39,7 +42,7 @@ Upgrade the API Management instance to the Premium SKU to add support for Availa
- [Change your API Management service tier](https://learn.microsoft.com/en-us/azure/api-management/upgrade-and-scale#change-your-api-management-service-tier)
- [Migrate Azure API Management to availability zone support](https://learn.microsoft.com/en-us/azure/reliability/migrate-api-mgt)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -51,6 +54,8 @@ Upgrade the API Management instance to the Premium SKU to add support for Availa
### APIM-2 - Enable Availability Zones on Premium API Management instances
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -62,7 +67,7 @@ Enable zone redundancy for APIM instances. With zone redundancy, the gateway and
- [Ensure API Management availability and reliability](https://learn.microsoft.com/en-us/azure/api-management/high-availability#availability-zones)
- [Migrate Azure API Management to availability zone support](https://learn.microsoft.com/en-us/azure/reliability/migrate-api-mgt)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -71,3 +76,28 @@ Enable zone redundancy for APIM instances. With zone redundancy, the gateway and
{{< /collapse >}}
+
+### APIM-3 - Upgrade to platform version stv2
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Upgrade to platform version stv2. The infrastructure associated with the API Management stv1 compute platform version will be retired effective 31 August 2024. A more current compute platform version (stv2) is already available and provides enhanced service capabilities.
+
+**Resources**
+
+- [Azure API Management - stv1 platform retirement (August 2024)](https://learn.microsoft.com/en-us/azure/api-management/breaking-changes/stv1-platform-retirement-august-2024)
+- [Azure API Management compute platform](https://learn.microsoft.com/en-us/azure/api-management/compute-infrastructure)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/apim-3/apim-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/integration/api-management/code/apim-1/apim-1.kql b/docs/content/services/integration/api-management/code/apim-1/apim-1.kql
index 8a800530f..1204d4bcd 100644
--- a/docs/content/services/integration/api-management/code/apim-1/apim-1.kql
+++ b/docs/content/services/integration/api-management/code/apim-1/apim-1.kql
@@ -4,4 +4,4 @@ resources
| where type =~ 'Microsoft.ApiManagement/service'
| extend skuName = sku.name
| where tolower(skuName) != tolower('premium')
-| project recommendationId = "apim-1", name, id, param1=strcat("SKU: ", skuName)
+| project recommendationId = "apim-1", name, id, tags, param1=strcat("SKU: ", skuName)
diff --git a/docs/content/services/integration/api-management/code/apim-2/apim-2.kql b/docs/content/services/integration/api-management/code/apim-2/apim-2.kql
index b16321f35..6a6d47303 100644
--- a/docs/content/services/integration/api-management/code/apim-2/apim-2.kql
+++ b/docs/content/services/integration/api-management/code/apim-2/apim-2.kql
@@ -6,4 +6,4 @@ resources
| where tolower(skuName) == tolower('premium')
| where isnull(zones) or array_length(zones) < 2
| extend zoneValue = iff((isnull(zones)), "null", zones)
-| project recommendationId = "apim-2", name, id, param1="Zones: No Zone or Zonal", param2=strcat("Zones value: ", zoneValue )
+| project recommendationId = "apim-2", name, id, tags, param1="Zones: No Zone or Zonal", param2=strcat("Zones value: ", zoneValue )
diff --git a/docs/content/services/integration/api-management/code/apim-3/apim-3.kql b/docs/content/services/integration/api-management/code/apim-3/apim-3.kql
new file mode 100644
index 000000000..ffee4ee92
--- /dev/null
+++ b/docs/content/services/integration/api-management/code/apim-3/apim-3.kql
@@ -0,0 +1,8 @@
+// Azure Resource Graph Query
+// Find all API Management instances that aren't upgraded to platform version stv2
+resources
+| where type =~ 'Microsoft.ApiManagement/service'
+| extend plat_version = properties.platformVersion
+| extend skuName = sku.name
+| where tolower(plat_version) != tolower('stv2')
+| project recommendationId = "apim-3", name, id, tags, param1=strcat("Platform Version: ", plat_version) , param2=strcat("SKU: ", skuName)
diff --git a/docs/content/services/integration/event-grid/_index.md b/docs/content/services/integration/event-grid/_index.md
index a19aa7415..b82de8f63 100644
--- a/docs/content/services/integration/event-grid/_index.md
+++ b/docs/content/services/integration/event-grid/_index.md
@@ -41,11 +41,11 @@ Enabling diagnostic settings allow you to capture and view diagnostic informatio
- [Azure Event Grid - Enable diagnostic logs for Event Grid resources](https://learn.microsoft.com/en-us/azure/event-grid/enable-diagnostic-logs-topic)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/evg-1/evg-1.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/evg-1/evg-1.kql" >}} {{< /code >}}
{{< /collapse >}}
@@ -65,11 +65,11 @@ When Event Grid can't deliver an event within a certain time period or after try
- [Azure Event Grid delivery and retry](https://learn.microsoft.com/en-us/azure/event-grid/delivery-and-retry#dead-letter-events)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/evg-2/evg-2.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/evg-2/evg-2.kql" >}} {{< /code >}}
{{< /collapse >}}
@@ -89,7 +89,7 @@ You can use private endpoints to allow ingress of events directly from your virt
- [Configure private endpoints for Azure Event Grid topics or domains](https://learn.microsoft.com/en-us/azure/event-grid/configure-private-endpoints)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/integration/event-grid/code/evg-1/evg-1.kql b/docs/content/services/integration/event-grid/code/evg-1/evg-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/integration/event-grid/code/evg-1/evg-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/integration/event-grid/code/evg-2/evg-2.kql b/docs/content/services/integration/event-grid/code/evg-2/evg-2.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/integration/event-grid/code/evg-2/evg-2.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/integration/event-grid/code/evg-3/evg-3.kql b/docs/content/services/integration/event-grid/code/evg-3/evg-3.kql
index cfa55284a..7653e568e 100644
--- a/docs/content/services/integration/event-grid/code/evg-3/evg-3.kql
+++ b/docs/content/services/integration/event-grid/code/evg-3/evg-3.kql
@@ -3,5 +3,5 @@
Resources
| where type contains "eventgrid"
| where properties['publicNetworkAccess'] == "Enabled"
-| project recommendationId = "evg-3", name, id
+| project recommendationId = "evg-3", name, id, tags
| order by id asc
diff --git a/docs/content/services/integration/event-hub/_index.md b/docs/content/services/integration/event-hub/_index.md
index c1d5aba1e..f0d581535 100644
--- a/docs/content/services/integration/event-hub/_index.md
+++ b/docs/content/services/integration/event-hub/_index.md
@@ -15,6 +15,7 @@ The presented resiliency recommendations in this guidance include Event Hub and
| Recommendation | Category | Impact | State | ARG Query Available |
| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
| [EVHNS-1 - Enable zone redundancy for Event Hub namespace](#evhns-1---enable-zone-redundancy-for-event-hub-namespace) | High Availability | High | Preview | Yes |
+| [EVHNS-2 - Enable auto-inflate on Event Hub Standard tier](#evhns-2---enable-auto-inflate-on-event-hub-standard-tier) | System Efficiency | High | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -39,7 +40,7 @@ Event Hubs supports Availability Zones, providing fault-isolated locations withi
- [Azure Event Hubs - Geo-disaster recovery](https://learn.microsoft.com/azure/event-hubs/event-hubs-geo-dr?tabs=portal#availability-zones)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -48,3 +49,27 @@ Event Hubs supports Availability Zones, providing fault-isolated locations withi
{{< /collapse >}}
+
+### EVHNS-2 - Enable auto-inflate on Event Hub Standard tier
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Recommendation**
+
+Enable auto-inflate on Event Hub Standard tier namespaces. The auto-inflate feature of Event Hubs automatically scales up by increasing the number of TUs, to meet usage needs. Increasing TUs prevents throttling scenarios where data ingress or data egress rates exceed the rates allowed by the TUs assigned to the namespace.
+
+**Resources**
+
+- [Azure Event Hubs - Automatically scale throughput units](https://learn.microsoft.com/azure/event-hubs/event-hubs-auto-inflate)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/evhns-2/evhns-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.azcli b/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.azcli
deleted file mode 100644
index 3e449c7e1..000000000
--- a/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-:: under-development
diff --git a/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.kql b/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.kql
index 497ffdcdb..699279653 100644
--- a/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.kql
+++ b/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.kql
@@ -3,5 +3,5 @@
resources
| where type == "microsoft.eventhub/namespaces"
| where properties.zoneRedundant == false
-| project recommendationId = "evhns-1", name, id, param1 = "ZoneRedundant: False"
+| project recommendationId = "evhns-1", name, id, tags, param1 = "ZoneRedundant: False"
| order by id asc
diff --git a/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.ps1 b/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.ps1
deleted file mode 100644
index 133b22465..000000000
--- a/docs/content/services/integration/event-hub/code/evhns-1/evhns-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-# under-development
diff --git a/docs/content/services/integration/event-hub/code/evhns-2/evhns-2.kql b/docs/content/services/integration/event-hub/code/evhns-2/evhns-2.kql
new file mode 100644
index 000000000..679bb4a59
--- /dev/null
+++ b/docs/content/services/integration/event-hub/code/evhns-2/evhns-2.kql
@@ -0,0 +1,7 @@
+// Azure Resource Graph Query
+// Find Event Hub namespace instances that are Standard tier and do not have Auto Inflate enabled
+resources
+| where type == "microsoft.eventhub/namespaces"
+| where sku.tier == "Standard"
+| where properties.isAutoInflateEnabled == "false"
+| project recommendationId = "evhns-2", name, id, tags, param1 = "AutoInflateEnabled: False"
diff --git a/docs/content/services/integration/service-bus/_index.md b/docs/content/services/integration/service-bus/_index.md
new file mode 100644
index 000000000..19f2b738f
--- /dev/null
+++ b/docs/content/services/integration/service-bus/_index.md
@@ -0,0 +1,52 @@
++++
+title = "Service Bus"
+description = "Best practices and resiliency recommendations for Service Bus and associated resources and settings."
+date = "2/13/24"
+author = "DaFitRobsta"
+msAuthor = "rolightn"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Service Bus and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------|:-----------------------------------------------------------------------:|:---------------:|:----------------:|:-------------------:|
+| [SBNS-1 - Enable Availability Zones for Service Bus namespaces](#sbns-1---enable-availability-zones-for-service-bus-namespaces) | Availability | High | Preview | Yes |
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### SBNS-1 - Enable Availability Zones for Service Bus namespaces
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Use Service Bus with zone redundancy for production workloads. The Service Bus Premium SKU supports availability zones, providing fault-isolated locations within the same Azure region. Service Bus manages three copies of the messaging store (1 primary and 2 secondary). Service Bus keeps all three copies in sync for data and management operations. If the primary copy fails, one of the secondary copies is promoted to primary with no perceived downtime. If the applications see transient disconnects from Service Bus, the retry logic in the SDK will automatically reconnect to Service Bus.
+
+**Resources**
+
+- [Service Bus and reliability](https://learn.microsoft.com/en-us/azure/well-architected/services/messaging/service-bus/reliability)
+- [Azure Service Bus Geo-disaster recovery](https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-geo-dr#availability-zones)
+- [Insulate Azure Service Bus applications against outages and disasters](https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-outages-disasters)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sbns-1/sbns-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/integration/service-bus/code/sbns-1/sbns-1.kql b/docs/content/services/integration/service-bus/code/sbns-1/sbns-1.kql
new file mode 100644
index 000000000..2be5b9efb
--- /dev/null
+++ b/docs/content/services/integration/service-bus/code/sbns-1/sbns-1.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Returns Service Bus namespaces that do not have any availability zones enabled
+resources
+| where type =~ 'Microsoft.ServiceBus/namespaces'
+| where properties.zoneRedundant == 'false'
+| project recommendationId = "sbns-1", name, id, tags, param1=strcat("zoneRedundant: ", properties.zoneRedundant), param2=strcat("SKU: ", sku.name), param3=iff(tolower(sku.name) == 'premium', 'Move Service Bus namespace to a region that supports Availability Zones', 'Migrate to Premium SKU in a region that supports Availability Zones')
diff --git a/docs/content/services/iot/iot-hub/_index.md b/docs/content/services/iot/iot-hub/_index.md
index 6b26c63fa..7e06defe1 100644
--- a/docs/content/services/iot/iot-hub/_index.md
+++ b/docs/content/services/iot/iot-hub/_index.md
@@ -15,11 +15,11 @@ The presented resiliency recommendations in this guidance include IoT Hub and as
| Recommendation | Category | Impact | State | ARG Query Available |
| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
| [IOTH-1 - Device Identities are exported to a secondary region](#ioth-1---device-identities-are-exported-to-a-secondary-region) | Disaster Recovery | High | Preview | No |
-| [IOTH-2 - Do not use free tier](#ioth-2---do-not-use-free-tier) | Availability | High | Preview | No |
+| [IOTH-2 - Do not use free tier](#ioth-2---do-not-use-free-tier) | Availability | High | Preview | Yes |
| [IOTH-3 - Use Availability Zones](#ioth-3---use-availability-zones) | Availability | High | Preview | No |
-| [IOTH-4 - Use Device Provisioning Service](#ioth-4---use-device-provisioning-service) | Scalability | Critical | Preview | No |
+| [IOTH-4 - Use Device Provisioning Service](#ioth-4---use-device-provisioning-service) | System Efficiency | High | Preview | Yes |
| [IOTH-5 - Define Failover Guidelines](#ioth-5---define-failover-guidelines) | Availability | High | Preview | No |
-| [IOTH-6 - Disabled Fallback Route](#ioth-6---disabled-fallback-route) | Monitoring | Low | Preview | No |
+| [IOTH-6 - Disabled Fallback Route](#ioth-6---disabled-fallback-route) | Monitoring | Low | Preview | Yes |
{{< /table >}}
@@ -48,7 +48,7 @@ Manual Failover of IoT Hub to another region is faster (RTO) and can be used for
- [Import and export IoT Hub device identities in bulk](https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-bulk-identity-mgmt)
- [IoT Hub high availability and disaster recovery](https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-ha-dr#manual-failover)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -72,7 +72,7 @@ In a production scenario the IoT Hub tier should not be Free, as the Free tier d
- [Choose the right IoT Hub tier and size for your solution](https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-scaling)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -96,7 +96,7 @@ In a region that supports Availability Zones for IoT Hub, these Zones should be
- [Azure IoT Hub high availability and disaster recovery](https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-ha-dr#availability-zones)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -108,9 +108,9 @@ In a region that supports Availability Zones for IoT Hub, these Zones should be
### IOTH-4 - Use Device Provisioning Service
-**Category: Scalability**
+**Category: System Efficiency**
-**Impact: Critical**
+**Impact: High**
**Recommendation**
@@ -124,7 +124,7 @@ Even IoT Hubs that are associated to a Device Provisioning Service need to be ch
- [Best practices for large-scale IoT device deployments](https://learn.microsoft.com/en-us/azure/iot-dps/concepts-deploy-at-scale)
- [IoT Hub Device Provisioning Service high availability and disaster recovery](https://learn.microsoft.com/en-us/azure/iot-dps/iot-dps-ha-dr)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -151,7 +151,7 @@ In case of a regional failure, an IoT Hub can failover to a second region. This
- [IoT Hub high availability and disaster recovery](https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-ha-dr)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -175,7 +175,7 @@ If message routing is used to route messages to custom endpoints, it can happen
- [Use message routing - Fallback route](https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-messages-d2c#fallback-route)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql b/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql
index 614a7f9ca..cba9cf902 100644
--- a/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql
+++ b/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// list all IoT Hubs that are using the Free tier
+resources
+| where type =~ "microsoft.devices/iothubs" and
+ tostring(sku.tier) =~ 'Free'
+| project recommendationId="ioth-2", name, id, tags, param1=strcat("tier:", tostring(sku.tier))
diff --git a/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql.fix b/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql.fix
deleted file mode 100644
index 59e12e2ab..000000000
--- a/docs/content/services/iot/iot-hub/code/ioth-2/ioth-2.kql.fix
+++ /dev/null
@@ -1,5 +0,0 @@
-Resources
-| where type == "microsoft.devices/iothubs"
-| project recommendationId="ioth-2", name, id, param1=strcat("Resource group: ", resourceGroup), param2=strcat("tier: ", tostring(sku.tier))
-| where param2 =~ 'Free'
-| order by name
diff --git a/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql b/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql
index 614a7f9ca..2ee93effb 100644
--- a/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql
+++ b/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql
@@ -1 +1,13 @@
-// under-development
+// Azure Resource Graph Query
+// list all IoT Hubs that do not have a linked IoT Hub Device Provisioning Service (DPS)
+resources
+| where type =~ "microsoft.devices/iothubs"
+| project id, iotHubName=tostring(properties.hostName), tags, resourceGroup
+| join kind=fullouter (
+ resources
+ | where type == "microsoft.devices/provisioningservices"
+ | mv-expand iotHubs=properties.iotHubs
+ | project iotHubName = tostring(iotHubs.name), dpsName = name, name=iotHubs.name
+) on iotHubName
+| where dpsName == ''
+| project recommendationId="ioth-4", name=iotHubName, id, tags, param1='DPS:none'
diff --git a/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql.fix b/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql.fix
deleted file mode 100644
index b9d65d672..000000000
--- a/docs/content/services/iot/iot-hub/code/ioth-4/ioth-4.kql.fix
+++ /dev/null
@@ -1,6 +0,0 @@
-resources
-| where type == "microsoft.devices/iothubs"
-| project id, iotHubName=tostring(properties.hostName), resourceGroup
-| join kind=fullouter (resources | where type == "microsoft.devices/provisioningservices" | mv-expand iotHubs=properties.iotHubs | project iotHubName = tostring(iotHubs.name), dpsName = name, name=iotHubs.name) on iotHubName
-| where iotHubName != ''
-| project recommendationId="ioth-4", name=iotHubName, id, param1=strcat("DPS: ", dpsName)
diff --git a/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql b/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql
index 614a7f9ca..33727c540 100644
--- a/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql
+++ b/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph Query
+// list all IoT Hubs that have the fallback route disabled
+resources
+| where type == "microsoft.devices/iothubs"
+| extend fallbackEnabled=properties.routing.fallbackRoute.isEnabled
+| where fallbackEnabled == false
+| project recommendationId="ioth-6", name, id, tags, param1='FallbackRouteEnabled:false'
diff --git a/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql.fix b/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql.fix
deleted file mode 100644
index e1076c132..000000000
--- a/docs/content/services/iot/iot-hub/code/ioth-6/ioth-6.kql.fix
+++ /dev/null
@@ -1,5 +0,0 @@
-Resources
-| where type == "microsoft.devices/iothubs"
-| extend fallbackEnabled=properties.routing.fallbackRoute.isEnabled
-| project recommendationId="ioth-6", id, name, param1=strcat("Resource group: ", resourceGroup), param2=strcat("Fallback enabled: ", fallbackEnabled)
-| where param2 == false
diff --git a/docs/content/services/management/automation-account/_index.md b/docs/content/services/management/automation-account/_index.md
index f69c6bdff..3118b9c4c 100644
--- a/docs/content/services/management/automation-account/_index.md
+++ b/docs/content/services/management/automation-account/_index.md
@@ -43,7 +43,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
- [Disaster recovery for Automation accounts](https://learn.microsoft.com/en-us/azure/automation/automation-disaster-recovery?tabs=win-hrw%2Cps-script%2Coption-one)
- [Disaster recovery scenarios for cloud and hybrid jobs](https://learn.microsoft.com/en-us/azure/automation/automation-disaster-recovery?tabs=win-hrw%2Cps-script%2Coption-one#scenarios-for-cloud-and-hybrid-jobs)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/management/management-group/_index.md b/docs/content/services/management/management-group/_index.md
index d0f763c0f..5e3e516db 100644
--- a/docs/content/services/management/management-group/_index.md
+++ b/docs/content/services/management/management-group/_index.md
@@ -12,10 +12,9 @@ The presented resiliency recommendations in this guidance include Management Gro
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [MG-1 - Subscriptions should not be placed under the Tenant Root Management Group](#mg-1---subscriptions-should-not-be-placed-under-the-tenant-root-management-group) | Medium | Preview | Yes |
-
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------:|:------:|:-------:|:-------------------:|
+| [MG-1 - Subscriptions should not be placed under the Tenant Root Management Group](#mg-1---subscriptions-should-not-be-placed-under-the-tenant-root-management-group) | Governance | Medium | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -39,13 +38,12 @@ Create management groups under your root-level management group to represent the
These groups are based on the security, compliance, connectivity, and feature needs of the workloads. With this grouping structure, you can have a set of Azure policies applied at the management group level. This grouping structure is for all workloads that require the same security, compliance, connectivity, and feature settings.
-
**Resources**
- [Management group recommendations](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/resource-org-management-groups#management-group-recommendations)
- [Root management group for each directory](https://learn.microsoft.com/en-us/azure/governance/management-groups/overview#root-management-group-for-each-directory)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/management/management-group/code/mg-1/mg-1.kql b/docs/content/services/management/management-group/code/mg-1/mg-1.kql
index 767307b80..de0cb3064 100644
--- a/docs/content/services/management/management-group/code/mg-1/mg-1.kql
+++ b/docs/content/services/management/management-group/code/mg-1/mg-1.kql
@@ -1,7 +1,7 @@
// Azure Resource Graph Query
// Provides a list of Azure Subscriptions that are placed under the Tenant Root Management Group
-ResourceContainers
+resourcecontainers
| where type == 'microsoft.resources/subscriptions'
| extend mgParentSize = array_length(properties.managementGroupAncestorsChain)
| where mgParentSize == 1
-| project recommendationId="mg-1", name, id
+| project recommendationId="mg-1", name, id, tags
diff --git a/docs/content/services/management/resource-group/_index.md b/docs/content/services/management/resource-group/_index.md
new file mode 100644
index 000000000..98a9f1916
--- /dev/null
+++ b/docs/content/services/management/resource-group/_index.md
@@ -0,0 +1,51 @@
++++
+title = "Resource Groups"
+description = "Best practices and resiliency recommendations for Resource Groups and associated resources and settings."
+date = "2/22/24"
+author = "edknox"
+msAuthor = "edknox"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Resource Groups and its associated settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------:|:------:|:-------:|:-------------------:|
+| [RG-1 - Ensure Resource Group and its Resources are located in the same Region](#rg-1---ensure-resource-group-and-its-resources-are-located-in-the-same-region) | Disaster Recovery | High | Preview | Yes |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### RG-1 - Ensure Resource Group and its Resources are located in the same Region
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Guidance**
+
+Ensure your resource locations match that of the containing resource group. This ensures that, in the event of a regional outage, you will still be able to manage your resource. ARM stores resource data for resources in a resource group and, if the region is unavailable, updates to this data could fail, making the resource effectively read-only.
+
+**Resources**
+
+- [Azure Resource Manager Overview](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/overview#resource-group-location-alignment)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/rg-1/rg-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/management/resource-group/code/rg-1/rg-1.kql b/docs/content/services/management/resource-group/code/rg-1/rg-1.kql
new file mode 100644
index 000000000..a3bbae00c
--- /dev/null
+++ b/docs/content/services/management/resource-group/code/rg-1/rg-1.kql
@@ -0,0 +1,12 @@
+// Azure Resource Graph Query
+// Provides a list of Azure Resource Groups that have resources deployed in a region different than the Resource Group region
+resources
+| project id, name, tags, resourceGroup, location
+| where location != "global" // exclude global resources
+| where resourceGroup != "networkwatcherrg" // exclude networkwatcherrg
+| where split(id, "/", 3)[0] =~ "resourceGroups" // resource is in a resource group
+| extend resourceGroupId = strcat_array(array_slice(split(id, "/"),0,4), "/") // create resource group resource id
+| join (resourcecontainers | project containerid=id, containerlocation=location ) on $left.resourceGroupId == $right.['containerid'] // join to resourcecontainers table
+| where location != containerlocation
+| project recommendationId="rg-1", name, id, tags
+| order by id asc
diff --git a/docs/content/services/management/subscription/_index.md b/docs/content/services/management/subscription/_index.md
new file mode 100644
index 000000000..8b8c9ea6a
--- /dev/null
+++ b/docs/content/services/management/subscription/_index.md
@@ -0,0 +1,57 @@
++++
+title = "Subscription"
+description = "Best practices and resiliency recommendations for Subscription and associated resources and settings."
+date = "3/20/24"
+author = "davenewman777"
+msAuthor = "davenew"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Subscription and associated resources and settings. This is for items where the resource being assessed is not a specific object but a subscription or collection of subscriptions.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------|:-----------------------------------------------------------------------:|:---------------:|:----------------:|:-------------------:|
+| [MS-1 - Do not create more than 2000 Citrix VDA servers per subscription](#ms-1---do-not-create-more-than-2000-citrix-vda-servers-per-subscription) | Application Resiliency | High | Preview | Yes |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### MS-1 - Do not create more than 2000 Citrix VDA servers per subscription
+
+**Category: Application Resilience**
+
+**Impact: High**
+
+**Guidance**
+
+A Citrix Managed Azure subscription supports the number of machines indicated in Limits. (In this context, machines refers to VMs that have a Citrix VDA installed. These machines deliver apps and desktops to users. It does not include other machines in a resource location, such as Cloud Connectors.)
+
+If your Citrix Managed Azure subscription is likely to reach its limit soon, and you have enough Citrix licenses, you can request another Citrix Managed Azure subscription. The dashboard contains a notification when you’re close to the limit.
+
+You can’t create a catalog (or add machines to a catalog) if the total number of machines for all catalogs that use that Citrix Managed Azure subscription would exceed the value indicated in Limits.
+
+This recommendation checks for 80% of the Citrix limit so that attention can be paid to this before hitting the published Citrix limit of 2500.
+
+**Resources**
+
+- [Citrix Limits](https://docs.citrix.com/en-us/citrix-daas-azure/limits)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ms-1/ms-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/management/subscription/code/ms-1/ms-1.kql b/docs/content/services/management/subscription/code/ms-1/ms-1.kql
new file mode 100644
index 000000000..654d7ee9a
--- /dev/null
+++ b/docs/content/services/management/subscription/code/ms-1/ms-1.kql
@@ -0,0 +1,11 @@
+// Azure Resource Graph Query
+// Count VM instances with a tag that contains "Citrix VDA" and create output if that count is >2000 for each subscription.
+// The Citrix published limit is 2500. This query runs an 80% check.
+
+resources
+| where type == 'microsoft.compute/virtualmachines'
+| where tags contains 'Citrix VDA'
+| summarize VMs=count() by subscriptionId
+| where VMs > 2000
+| join (resourcecontainers| where type =='microsoft.resources/subscriptions' | project subname=name, subscriptionId) on subscriptionId
+| project recommendationId='MS-1', name= subname, id = subscriptionId, param1='Too many instances.', param2= VMs
diff --git a/docs/content/services/migration/azure-backup/_index.md b/docs/content/services/migration/azure-backup/_index.md
index 691c330e2..c7f2a6450 100644
--- a/docs/content/services/migration/azure-backup/_index.md
+++ b/docs/content/services/migration/azure-backup/_index.md
@@ -12,10 +12,11 @@ The presented resiliency recommendations in this guidance include Backup and ass
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [BK-1 - Migrate from classic alerts to built-in Azure Monitor alerts for Recovery services vaults](#bk-1---migrate-from-classic-alerts-to-built-in-azure-monitor-alerts-for-recovery-services-vaults) | Medium | Preview | Yes |
-
+|
+Recommendation | Category | Impact | State | ARG Query Available |
+:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|--------|:--------:|:-------------------:|
+| [BK-1 - Migrate from classic alerts to built-in Azure Monitor alerts for Azure Recovery Services Vaults](#bk-1---migrate-from-classic-alerts-to-built-in-azure-monitor-alerts-for-azure-recovery-services-vaults) | Monitoring | Medium | Verified | Yes |
+| [BK-2 - Opt-in to Cross Region Restore for all Geo-Redundant Storage (GRS) Azure Recovery Services vaults](#bk-2---opt-in-to-cross-region-restore-for-all-geo-redundant-storage-grs-azure-recovery-services-vaults) | Disaster Recovery | Medium | Verified | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -26,7 +27,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
## Recommendations Details
-### BK-1 - Migrate from classic alerts to built-in Azure Monitor alerts for Recovery services vaults
+### BK-1 - Migrate from classic alerts to built-in Azure Monitor alerts for Azure Recovery Services Vaults
**Category: Monitoring**
@@ -45,10 +46,10 @@ Using Azure Monitor Alerts you can:
**Resources**
-- [Move to Azure monitor Alerts](https://learn.microsoft.com/en-us/azure/backup/move-to-azure-monitor-alerts)
-- [Classic alerts retirement announcement](https://azure.microsoft.com/en-us/updates/transition-to-builtin-azure-monitor-alerts-for-recovery-services-vaults-in-azure-backup-by-31-march-2026/)
+- [Move to Azure monitor Alerts](https://learn.microsoft.com/azure/backup/move-to-azure-monitor-alerts)
+- [Classic alerts retirement announcement](https://azure.microsoft.com/updates/transition-to-builtin-azure-monitor-alerts-for-recovery-services-vaults-in-azure-backup-by-31-march-2026/)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -56,4 +57,29 @@ Using Azure Monitor Alerts you can:
{{< /collapse >}}
+### BK-2 - Opt-in to Cross Region Restore for all Geo-Redundant Storage (GRS) Azure Recovery Services vaults
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance**
+
+Cross Region Restore allows you to restore Azure VMs in a secondary region, which is an Azure paired region. This option allows you to conduct drills to meet audit or compliance requirements, and to restore the VM or its disk if there's a disaster in the primary region. CRR is an opt-in feature for any GRS vault only.
+
+**Resources**
+
+- [Set Cross Region Restore](https://learn.microsoft.com/azure/backup/backup-create-recovery-services-vault#set-cross-region-restore)
+- [Azure Backup Best Practices](https://learn.microsoft.com/azure/backup/guidance-best-practices)
+- [Minimum Role Requirements for Cross Region Restore](https://learn.microsoft.com/azure/backup/backup-rbac-rs-vault#minimum-role-requirements-for-azure-vm-backup)
+- [Recovery Services Vault](https://learn.microsoft.com/azure/backup/backup-azure-arm-vms-prepare)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/bk-2/bk-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
diff --git a/docs/content/services/migration/azure-backup/code/bk-1/bk-1.kql b/docs/content/services/migration/azure-backup/code/bk-1/bk-1.kql
index 64cfd4efe..e8791b477 100644
--- a/docs/content/services/migration/azure-backup/code/bk-1/bk-1.kql
+++ b/docs/content/services/migration/azure-backup/code/bk-1/bk-1.kql
@@ -1,8 +1,8 @@
// This Resource Graph query will return all Recovery services vault with Classic alerts enabled.
-Resources
+resources
| where type in~ ('microsoft.recoveryservices/vaults')
| extend monitoringSettings = parse_json(properties).monitoringSettings
-| extend isUsingClassicAlerts = case(isnull(monitoringSettings), 'Enabled',monitoringSettings.classicAlertSettings.alertsForCriticalOperations)
+| extend isUsingClassicAlerts = case(isnull(monitoringSettings),'Enabled',monitoringSettings.classicAlertSettings.alertsForCriticalOperations)
| extend isUsingJobsAlerts = case(isnull(monitoringSettings), 'Enabled', monitoringSettings.azureMonitorAlertSettings.alertsForAllJobFailures)
| where isUsingClassicAlerts == 'Enabled'
-| project recommendationId = "bk-1",name,id,tags,param1=strcat("isUsingClassicAlerts: ", isUsingClassicAlerts),param2=strcat("isUsingJobsAlerts: ", isUsingJobsAlerts)
+| project recommendationId = "bk-1", name, id, tags, param1=strcat("isUsingClassicAlerts: ", isUsingClassicAlerts), param2=strcat("isUsingJobsAlerts: ", isUsingJobsAlerts)
diff --git a/docs/content/services/migration/azure-backup/code/bk-2/bk-2.kql b/docs/content/services/migration/azure-backup/code/bk-2/bk-2.kql
new file mode 100644
index 000000000..0e4f69ad1
--- /dev/null
+++ b/docs/content/services/migration/azure-backup/code/bk-2/bk-2.kql
@@ -0,0 +1,7 @@
+// Azure Resource Graph Query
+// Displays all recovery services vaults that do not have cross region restore enabled
+
+resources
+| where type == "microsoft.recoveryservices/vaults"
+| where properties.properties.enableCrossRegionRestore != true
+| project recommendationId = "bk-2", name, id, tags
diff --git a/docs/content/services/monitoring/application-insights/_index.md b/docs/content/services/monitoring/application-insights/_index.md
index 0ae577923..581f65ad2 100644
--- a/docs/content/services/monitoring/application-insights/_index.md
+++ b/docs/content/services/monitoring/application-insights/_index.md
@@ -41,7 +41,7 @@ Classic Application Insights will be retired in February 2024. To minimize disru
- [Migrate an Application Insights classic resource to a workspace-based resource](https://learn.microsoft.com/en-us/azure/azure-monitor/app/convert-classic-resource)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/monitoring/application-insights/code/appi-1/appi-1.kql b/docs/content/services/monitoring/application-insights/code/appi-1/appi-1.kql
index c07bc3944..db9cc276c 100644
--- a/docs/content/services/monitoring/application-insights/code/appi-1/appi-1.kql
+++ b/docs/content/services/monitoring/application-insights/code/appi-1/appi-1.kql
@@ -2,4 +2,4 @@ resources
| where type =~ "microsoft.insights/components"
| extend IngestionMode = properties.IngestionMode
| where IngestionMode =~ 'ApplicationInsights'
-| project recommendationId= "APPI-1", name, id, param1="ApplicationInsightsDeploymentType: Classic"
+| project recommendationId= "appi-1", name, id, tags, param1="ApplicationInsightsDeploymentType: Classic"
diff --git a/docs/content/services/monitoring/log-analytics/_index.md b/docs/content/services/monitoring/log-analytics/_index.md
index 9610afa75..0b5afb7de 100644
--- a/docs/content/services/monitoring/log-analytics/_index.md
+++ b/docs/content/services/monitoring/log-analytics/_index.md
@@ -14,13 +14,11 @@ The presented resiliency recommendations in this guidance include Log Analytics
The below table shows the list of resiliency recommendations for Log Analytics and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [LOG-1 - Enable Log Analytics data export to GRS or GZRS](#log-1---enable-log-analytics-data-export-to-grs-or-gzrs) | Medium | Preview | No |
-| [LOG-2 - Link Log Analytics Workspace to an Availability Zone enabled dedicated cluster](#log-2---link-log-analytics-workspace-to-an-availability-zone-enabled-dedicated-cluster) | Medium | Preview | No |
-| [LOG-3 - Configure data collection to send critical data to multiple workspaces in different regions](#log-3---configure-data-collection-to-send-critical-data-to-multiple-workspaces-in-different-regions) | Medium | Preview | No |
-| [LOG-4 - Create a health status alert rule for your Log Analytics workspace](#log-4---create-a-health-status-alert-rule-for-your-log-analytics-workspace) | Low | Preview | No |
-| [LOG-5 - Configure minimal logging and retention of logs](#log-5---configure-minimal-logging-and-retention-of-logs) | Low | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [LOG-1 - Enable Log Analytics data export to GRS or GZRS](#log-1---enable-log-analytics-data-export-to-grs-or-gzrs) | Governance | Medium | Verified | No |
+| [LOG-4 - Create a health status alert rule for your Log Analytics workspace](#log-4---create-a-health-status-alert-rule-for-your-log-analytics-workspace) | Monitoring | Low | Verified | No |
+| [LOG-5 - Configure minimal logging and retention of logs](#log-5---configure-minimal-logging-and-retention-of-logs) | Governance | Low | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -33,20 +31,20 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### LOG-1 - Enable Log Analytics data export to GRS or GZRS
-**Category: Disaster Recovery**
+**Category: Governance**
**Impact: Medium**
**Guidance**
-Data export in a Log Analytics workspace lets you continuously export data to an Azure Storage account. Protect your Log Analytics workspace data from the unlikely event of a regional failure by continuously exporting to a geo-redundant storage (GRS) or geo-zone-redundant storage (GZRS) account.
+Data export in a Log Analytics workspace lets you continuously export data to an Azure Storage account. Protect your Log Analytics workspace data from the unlikely event of a regional failure by continuously exporting to a geo-redundant storage (GRS) or geo-zone-redundant storage (GZRS) account. This is primarily a recommendation to meet compliance for data retention, but can also be used to integrate the data with other Azure services and tools.
**Resources**
- [Log Analytics workspace data export in Azure Monitor](https://learn.microsoft.com/azure/azure-monitor/logs/logs-data-export)
- [Azure Monitor configuration recommendations](https://learn.microsoft.com/azure/azure-monitor/best-practices-logs#configuration-recommendations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -56,56 +54,6 @@ Data export in a Log Analytics workspace lets you continuously export data to an
-### LOG-2 - Link Log Analytics Workspace to an Availability Zone enabled dedicated cluster
-
-**Category: Availability**
-
-**Impact: Medium**
-
-**Guidance**
-
-Link your Log Analytics workspace to an availability zone enabled dedicated cluster to increase the resilience of Azure Monitor features that rely on your Log Analytics workspace and to protect your Log Analytics data against the unlikely event of a datacenter failure.
-
-**Resources**
-
-- [Enhance data and service resilience in Azure Monitor Logs with availability zones](https://learn.microsoft.com/azure/azure-monitor/logs/availability-zones)
-- [Create and manage a dedicated cluster in Azure Monitor Logs](https://learn.microsoft.com/azure/azure-monitor/logs/logs-dedicated-clusters)
-- [Azure Monitor configuration recommendations](https://learn.microsoft.com/azure/azure-monitor/best-practices-logs#configuration-recommendations)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/log-2/log-2.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### LOG-3 - Configure data collection to send critical data to multiple workspaces in different regions
-
-**Category: Disaster Recovery**
-
-**Impact: Medium**
-
-**Guidance**
-
-If you require a workspace to be available in the unlikely scenario of a regional failure then configure data collection to send critical data to multiple workspaces in different regions.
-
-**Resources**
-
-- [Azure Monitor configuration recommendations](https://learn.microsoft.com/azure/azure-monitor/best-practices-logs#configuration-recommendations)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/log-3/log-3.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
### LOG-4 - Create a health status alert rule for your Log Analytics workspace
**Category: Monitoring**
@@ -121,7 +69,7 @@ A health status alert will proactively notify you if a workspace becomes unavail
- [Monitor Log Analytics workspace health](https://learn.microsoft.com/azure/azure-monitor/logs/log-analytics-workspace-health)
- [Azure Monitor configuration recommendations](https://learn.microsoft.com/azure/azure-monitor/best-practices-logs#configuration-recommendations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -133,15 +81,15 @@ A health status alert will proactively notify you if a workspace becomes unavail
### LOG-5 - Configure minimal logging and retention of logs
-**Category: Monitoring**
+**Category: Governance**
**Impact: Low**
**Guidance**
- Azure Monitor Logs automatically retains log data for a specific period of time depending on the data type (for example, 31 days for platform logs and metrics). However, you may need to retain your data for longer periods for compliance or business reasons. You can configure the data retention settings based on your requirements.
+ Azure Monitor Logs automatically retains log data for a specific period of time depending on the data type (for example, 30 days for platform logs and metrics). However, you may need to retain your data for longer periods for compliance or business reasons. You can configure the data retention settings based on your requirements.
- For long-term storage, it might be necessary to move logs from Azure Monitor to a more cost-effective storage solution, such as Azure Blob Storage. This allows you to keep logs for an extended period of time without incurring high costs.
+ Use Azure Monitor archive settings for older, less used data in your workspace at a reduced cost. You can access data in the archived state by using search jobs and restore. You can keep data in archived state for up to 12 years.
**Resources**
@@ -149,7 +97,7 @@ A health status alert will proactively notify you if a workspace becomes unavail
- [Run search jobs in Azure Monitor](https://learn.microsoft.com/en-us/azure/azure-monitor/logs/search-jobs?tabs=portal-1%2Cportal-2)
- [Restore logs in Azure Monitor](https://learn.microsoft.com/en-us/azure/azure-monitor/logs/restore?tabs=api-1)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/monitoring/log-analytics/code/log-1/log-1.kql b/docs/content/services/monitoring/log-analytics/code/log-1/log-1.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/monitoring/log-analytics/code/log-1/log-1.kql
+++ b/docs/content/services/monitoring/log-analytics/code/log-1/log-1.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/monitoring/log-analytics/code/log-2/log-2.azcli b/docs/content/services/monitoring/log-analytics/code/log-2/log-2.azcli
deleted file mode 100644
index ab2dc9f39..000000000
--- a/docs/content/services/monitoring/log-analytics/code/log-2/log-2.azcli
+++ /dev/null
@@ -1,3 +0,0 @@
-#Get LAW linked dedicated cluster ID and then determine if it is AZ enabled
-clusterResourceId=$(az monitor log-analytics workspace show --resource-group "resource group name" --workspace-name "log anaalytics workspace name" --query "features.clusterResourceId" --output tsv)
-az monitor log-analytics cluster show --ids $clusterResourceId --query 'isAvailabilityZonesEnabled'
diff --git a/docs/content/services/monitoring/log-analytics/code/log-4/log-4.kql b/docs/content/services/monitoring/log-analytics/code/log-4/log-4.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/monitoring/log-analytics/code/log-4/log-4.kql
+++ b/docs/content/services/monitoring/log-analytics/code/log-4/log-4.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/monitoring/log-analytics/code/log-5/log-5.kql b/docs/content/services/monitoring/log-analytics/code/log-5/log-5.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/monitoring/log-analytics/code/log-5/log-5.kql
+++ b/docs/content/services/monitoring/log-analytics/code/log-5/log-5.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/monitoring/resource-health-alerts/_index.md b/docs/content/services/monitoring/resource-health-alerts/_index.md
index 3bef4a069..501dd18d3 100644
--- a/docs/content/services/monitoring/resource-health-alerts/_index.md
+++ b/docs/content/services/monitoring/resource-health-alerts/_index.md
@@ -14,9 +14,9 @@ The presented resiliency recommendations in this guidance include Resources Heal
The below table shows the list of resiliency recommendations for Resources Health Alerts and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | Category | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------------: | :--------:| :--------:| :------------------:|
-| [MSR-1 - Configure Resource Health Alerts](#msr-1---configure-resource-health-alerts) | Monitoring | Low | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------|:----------:|:------:|:-------:|:-------------------:|
+| [MSR-1 - Configure Resource Health Alerts](#msr-1---configure-resource-health-alerts) | Monitoring | Low | Preview | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -40,10 +40,10 @@ Configure Resource Health Alerts for all applicable resources. Azure Resource He
**Resources**
- [Resource Health](https://learn.microsoft.com/en-us/azure/service-health/resource-health-overview)
-- [Configure Resource Health alerts in the Azure portal](https://learn.microsoft.com/en-us/azure/service-health/resource-health-alert-monitor-guide#create-a-resource-health-alert-rule-in-the-azure-portal )
+- [Configure Resource Health alerts in the Azure portal](https://learn.microsoft.com/en-us/azure/service-health/resource-health-alert-monitor-guide#create-a-resource-health-alert-rule-in-the-azure-portal)
- [Alerts Health](https://learn.microsoft.com/en-us/azure/service-health/alerts-activity-log-service-notifications-portal)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/monitoring/resource-health-alerts/code/msr-1.kql b/docs/content/services/monitoring/resource-health-alerts/code/msr-1/msr-1.kql
similarity index 100%
rename from docs/content/services/monitoring/resource-health-alerts/code/msr-1.kql
rename to docs/content/services/monitoring/resource-health-alerts/code/msr-1/msr-1.kql
diff --git a/docs/content/services/monitoring/service-health-alerts/_index.md b/docs/content/services/monitoring/service-health-alerts/_index.md
new file mode 100644
index 000000000..ecac452e9
--- /dev/null
+++ b/docs/content/services/monitoring/service-health-alerts/_index.md
@@ -0,0 +1,49 @@
++++
+title = "Service Health Alerts"
+description = "Best practices and resiliency recommendations for Service Health Alerts and associated resources and settings."
+date = "2/23/24"
+author = "ejhenry"
+msAuthor = "erhenry"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Service Health Alerts and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------|:-----------------------------------------------------------------------:|:---------------:|:----------------:|:-------------------:|
+| [ALA-1 - Configure Service Health Alerts](#ala-1---configure-service-health-alerts) | Monitoring | High | Preview | No |
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### ALA-1 - Configure Service Health Alerts
+
+**Category: Monitoring**
+
+**Impact: High**
+
+**Guidance**
+
+Service health provides a personalized view of the health of the Azure services and regions you're using. This is the best place to look for service impacting communications about outages, planned maintenance activities, and other health advisories because the authenticated Service Health experience knows which services and resources you currently use. The best way to use Service Health is to set up Service Health alerts to notify you via your preferred communication channels when service issues, planned maintenance, or other changes may affect the Azure services and regions you use.
+
+**Resources**
+
+- [What is Azure Service Health?](https://learn.microsoft.com/azure/service-health/overview)
+- [Configure alerts for service health events](https://learn.microsoft.com/azure/service-health/alerts-activity-log-service-notifications-portal)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ala-1/ala-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
diff --git a/docs/content/services/monitoring/service-health-alerts/code/ala-1/ala-1.kql b/docs/content/services/monitoring/service-health-alerts/code/ala-1/ala-1.kql
new file mode 100644
index 000000000..0e3d995ba
--- /dev/null
+++ b/docs/content/services/monitoring/service-health-alerts/code/ala-1/ala-1.kql
@@ -0,0 +1,17 @@
+// Azure Resource Graph Query
+// This resource graph query will return all subscriptions without Service Health alerts configured.
+
+resourcecontainers
+| where type == 'microsoft.resources/subscriptions'
+| project subscriptionAlerts=tostring(id),name,tags
+| join kind=leftouter (
+ resources
+ | where type == 'microsoft.insights/activitylogalerts' and properties.condition contains "ServiceHealth"
+ | extend subscriptions = properties.scopes
+ | project subscriptions
+ | mv-expand subscriptions
+ | project subscriptionAlerts = tostring(subscriptions)
+) on subscriptionAlerts
+| where isempty(subscriptionAlerts1)
+| project-away subscriptionAlerts1
+| project recommendationID = "ala-1",id=subscriptionAlerts,name,tags
diff --git a/docs/content/services/networking/application-gateway/_index.md b/docs/content/services/networking/application-gateway/_index.md
index a66e62791..fbb9ccc0d 100644
--- a/docs/content/services/networking/application-gateway/_index.md
+++ b/docs/content/services/networking/application-gateway/_index.md
@@ -12,17 +12,17 @@ The presented resiliency recommendations in this guidance include Application Ga
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :-----------------: |
-| [AGW-1 - Set a minimum instance count of 2](#agw-1---set-a-minimum-instance-count-of-2) | High | Preview | Yes |
-| [AGW-2 - Secure all incoming connections with SSL](#agw-2---secure-all-incoming-connections-with-ssl) | High | Preview | No |
-| [AGW-3 - Enable WAF policies](#agw-3---enable-web-application-firewall-policies) | High | Preview | Yes |
-| [AGW-4 - Use Application GW V2 instead of V1](#agw-4---use-application-gw-v2-instead-of-v1) | High | Preview | No |
-| [AGW-5 - Monitor and Log the configurations and traffic](#agw-5---monitor-and-log-the-configurations-and-traffic) | Medium | Preview | No |
-| [AGW-6 - Use Health Probes to detect backend availability](#agw-6---use-health-probes-to-detect-backend-availability) | Medium | Preview | No |
-| [AGW-7 - Deploy backends in a zone-redundant configuration](#agw-7---deploy-backends-in-a-zone-redundant-configuration) | High | Preview | No |
-| [AGW-8 - Plan for backend maintenance by using connection draining](#agw-8---plan-for-backend-maintenance-by-using-connection-draining) | Medium | Preview | No |
-| [AGW-9 - Ensure Application Gateway Subnet is using a /24 subnet mask](#agw-9---ensure-application-gateway-subnet-is-using-a-24-subnet-mask) | High | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:---------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [AGW-1 - Set a minimum instance count of 2](#agw-1---set-a-minimum-instance-count-of-2) | System Efficiency | High | Preview | Yes |
+| [AGW-2 - Secure all incoming connections with SSL](#agw-2---secure-all-incoming-connections-with-ssl) | Access & Security | High | Preview | Yes |
+| [AGW-3 - Enable WAF policies](#agw-3---enable-web-application-firewall-policies) | Access & Security | High | Preview | Yes |
+| [AGW-4 - Use Application GW V2 instead of V1](#agw-4---use-application-gw-v2-instead-of-v1) | System Efficiency | High | Preview | Yes |
+| [AGW-5 - Monitor and Log the configurations and traffic](#agw-5---monitor-and-log-the-configurations-and-traffic) | Monitoring | Medium | Preview | No |
+| [AGW-6 - Use Health Probes to detect backend availability](#agw-6---use-health-probes-to-detect-backend-availability) | Monitoring | Medium | Preview | Yes |
+| [AGW-7 - Deploy Application Gateway in a zone-redundant configuration](#agw-7---deploy-application-gateway-in-a-zone-redundant-configuration)| Availability | High | Preview | Yes |
+| [AGW-8 - Plan for backend maintenance by using connection draining](#agw-8---plan-for-backend-maintenance-by-using-connection-draining) | Governance | Medium | Preview | No |
+| [AGW-9 - Ensure Application Gateway Subnet is using a /24 subnet mask](#agw-9---ensure-application-gateway-subnet-is-using-a-24-subnet-mask) | Networking | High | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -47,7 +47,7 @@ Azure Application Gateways v2 are always deployed in a highly available fashion,
- [Application Gateway Autoscaling Zone-Redundant](https://learn.microsoft.com/azure/application-gateway/application-gateway-autoscaling-zone-redundant#autoscaling-and-high-availability)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -65,7 +65,7 @@ Azure Application Gateways v2 are always deployed in a highly available fashion,
**Guidance**
-Ensure that all incoming connections are using HTTP/s for production services. Using end to end SSL/TLS or SSL/TLS termination to ensure the security of all incoming connections to the Application Gateway allows you and your users to be safe from possible attacks as it ensures that all data passed between the web server and browsers remain private and encrypted.
+Ensure that all incoming connections are using HTTPs for production services. Using end to end SSL/TLS or SSL/TLS termination to ensure the security of all incoming connections to the Application Gateway allows you and your users to be safe from possible attacks as it ensures that all data passed between the web server and browsers remain private and encrypted.
**Resources**
@@ -75,7 +75,7 @@ Ensure that all incoming connections are using HTTP/s for production services.
- [Application Gateway KeyVault Certs](https://learn.microsoft.com/azure/application-gateway/key-vault-certs)
- [Application Gateway SSL Cert Management](https://learn.microsoft.com/azure/application-gateway/ssl-certificate-management)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -100,7 +100,7 @@ Use Application Gateway with Web Application Firewall (WAF) within an applicatio
- [Well-Architected Framework Application Gateway Overview](https://learn.microsoft.com/azure/well-architected/services/networking/azure-application-gateway)
- [Application Gateway - Web Application Firewall](https://learn.microsoft.com/azure/application-gateway/features#web-application-firewall)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -118,7 +118,7 @@ Use Application Gateway with Web Application Firewall (WAF) within an applicatio
**Guidance**
-You should use Application Gateway v2 unless there is a compelling reason for using v1. V2 has many more built in features such as autoscaling, static VIPs, Azure KeyVault integration for certificate management and many more features listed in our comparison charts. Leveraging this updated version allows for better performance and control of how your traffic routed and the ability to make changes to the traffic.
+You should use Application Gateway v2 unless there is a compelling reason for using v1. V2 has many more built in features such as autoscaling, static VIPs, Azure KeyVault integration for certificate management and many more features listed in our comparison charts. Leveraging this updated version allows for better performance and control of how your traffic routed and the ability to make changes to the traffic.
**Resources**
@@ -126,7 +126,7 @@ You should use Application Gateway v2 unless there is a compelling reason for us
- [Application Gateway Feature Comparison Between V1 and V2](https://learn.microsoft.com/azure/application-gateway/overview-v2#feature-comparison-between-v1-sku-and-v2-sku)
- [Application Gateway V1 Retirement](https://azure.microsoft.com/updates/application-gateway-v1-will-be-retired-on-28-april-2026-transition-to-application-gateway-v2/)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -144,14 +144,14 @@ You should use Application Gateway v2 unless there is a compelling reason for us
**Guidance**
-Enable logs that can be stored in storage accounts, Log Analytics, and other monitoring services. If NSGs are applied NSG flow logs can be enabled and stored for traffic audit and to provide insights into the traffic flowing into your Azure Cloud.
+Enable logs that can be stored in storage accounts, Log Analytics, and other monitoring services. If NSGs are applied NSG flow logs can be enabled and stored for traffic audit and to provide insights into the traffic flowing into your Azure Cloud.
**Resources**
- [Application Gateway Metrics](https://learn.microsoft.com/azure/application-gateway/application-gateway-metrics)
- [Application Gateway Diagnostics](https://learn.microsoft.com/azure/application-gateway/application-gateway-diagnostics)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -176,7 +176,7 @@ Using custom health probes can help with understand the availability of your bac
- [Application Gateway Probe Overview](https://learn.microsoft.com/azure/application-gateway/application-gateway-probe-overview)
- [Well-Architected Framework Application Gateway Overview](https://learn.microsoft.com/azure/well-architected/services/networking/azure-application-gateway)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -186,7 +186,7 @@ Using custom health probes can help with understand the availability of your bac
-### AGW-7 - Deploy backends in a zone-redundant configuration
+### AGW-7 - Deploy Application Gateway in a zone-redundant configuration
**Category: Availability**
@@ -194,14 +194,14 @@ Using custom health probes can help with understand the availability of your bac
**Guidance**
-Deploying your backend services in a zone-aware configurations ensures that if a specific zone goes down that customers will still have access to the services as the other services located in other zones will still be available.
+Deploying your Application Gateway in a zone-aware configurations ensures that if a specific zone goes down that customers will still have access to the services as the other services located in other zones will still be available.
**Resources**
- [Well-Architected Framework Application Gateway Reliability](https://learn.microsoft.com/azure/well-architected/services/networking/azure-application-gateway#reliability)
- [Application Gateway V2 Overview](https://learn.microsoft.com/azure/application-gateway/overview-v2)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -226,7 +226,7 @@ Plan for backend maintenance by using connection draining. Connection draining h
- [Application Gateway Connection Draining](https://learn.microsoft.com/azure/application-gateway/features#connection-draining)
- [Application Gateway Connection Draining HTTP Settings](https://learn.microsoft.com/azure/application-gateway/configuration-http-settings#connection-draining)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
{{< code lang="sql" file="code/agw-8/agw-8.kql" >}} {{< /code >}}
@@ -237,6 +237,8 @@ Plan for backend maintenance by using connection draining. Connection draining h
### AGW-9 - Ensure Application Gateway Subnet is using a /24 subnet mask
+**Category: Networking**
+
**Impact: High**
**Recommendation/Guidance**
@@ -247,7 +249,7 @@ Application Gateway (Standard_v2 or WAF_v2 SKU) can support up to 125 instances.
- [Azure Application Gateway infrastructure configuration | Microsoft Learn](https://learn.microsoft.com/en-us/azure/application-gateway/configuration-infrastructure#size-of-the-subnet)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
{{< code lang="sql" file="code/agw-9/agw-9.kql" >}} {{< /code >}}
diff --git a/docs/content/services/networking/application-gateway/code/agw-1/agw-1.kql b/docs/content/services/networking/application-gateway/code/agw-1/agw-1.kql
index 9cebe63a8..14f74a3ea 100644
--- a/docs/content/services/networking/application-gateway/code/agw-1/agw-1.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-1/agw-1.kql
@@ -3,6 +3,6 @@
resources
| where type =~ "microsoft.network/applicationGateways"
| where isnull(properties.autoscaleConfiguration) or properties.autoscaleConfiguration.minCapacity <= 1
-| project recommendationId = "agw-1", name, id, param1 = "autoScaleConfiguration: isNull or MinCapacity <= 1"
+| project recommendationId = "agw-1", name, id, tags, param1 = "autoScaleConfiguration: isNull or MinCapacity <= 1"
| order by id asc
diff --git a/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql b/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql
index 614a7f9ca..45eff22aa 100644
--- a/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql
@@ -1 +1,8 @@
-// under-development
+// Azure Resource Graph Query
+// You can use the following Azure Resource Graph query to check if an HTTP rule is using an SSL certificate or is using Azure Key Vault to store the certificates
+resources
+| where type =~ "microsoft.network/applicationGateways"
+| mv-expand frontendPorts = properties.frontendPorts
+| mv-expand httpListeners = properties.httpListeners
+| where isnull(parse_json(httpListeners.properties.sslCertificate))
+| project recommendationId="agw-2", name, id, tags, param1=strcat("frontendPort: ", frontendPorts.properties.port), param2="tls: false"
diff --git a/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql.fix b/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql.fix
deleted file mode 100644
index c5a8db12f..000000000
--- a/docs/content/services/networking/application-gateway/code/agw-2/agw-2.kql.fix
+++ /dev/null
@@ -1,7 +0,0 @@
-// Azure Resource Graph Query
-// You can use the following Azure Resource Graph query to check if an HTTP rule is using an SSL certificate or is using Azure Key Vault to store the certificates
-resources
-| where type =~ "microsoft.network/applicationGateways"
-| extend ssl_enabled = tobool(isnotnull(properties.sslCertificates[0].keyVaultSecretId) or isnotnull(properties.sslCertificates[0].keyVaultSecretUrl))
-| where properties.frontendPorts[0].port == 443 and ssl_enabled == "true"
-| project recommendationId="agw-2",name,id, param1=strcat("ssl_enabled: ", ssl_enabled)
diff --git a/docs/content/services/networking/application-gateway/code/agw-3/agw-3.kql b/docs/content/services/networking/application-gateway/code/agw-3/agw-3.kql
index db7c7e424..20b019fa3 100644
--- a/docs/content/services/networking/application-gateway/code/agw-3/agw-3.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-3/agw-3.kql
@@ -3,6 +3,6 @@
Resources
| where type =~ "microsoft.network/applicationGateways"
| where properties.firewallpolicy != ""
-| project recommendationId = "agw-3", name, id, param1 = "webApplicationFirewallConfiguration: isNull"
+| project recommendationId = "agw-3", name, id, tags, param1 = "webApplicationFirewallConfiguration: isNull"
| order by id asc
diff --git a/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql b/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql
index 614a7f9ca..a5955469e 100644
--- a/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph Query
+// Get all Application Gateways, which are using the deprecated V1 SKU
+resources
+| where type =~ 'microsoft.network/applicationgateways'
+| extend tier = properties.sku.tier
+| where tier == 'Standard' or tier == 'WAF'
+| project recommendationId = "agw-4", name, id, tags
diff --git a/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql.fix b/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql.fix
deleted file mode 100644
index 7b975d946..000000000
--- a/docs/content/services/networking/application-gateway/code/agw-4/agw-4.kql.fix
+++ /dev/null
@@ -1,9 +0,0 @@
-// Azure Resource Graph Query
-// This query will return all Application Gateways in your Azure environment and will identify if they are v1 or v2
-resources
-| where type =~ "microsoft.network/applicationGateways"
-| extend sku = tolower(tostring(properties.sku.name))
-| where sku != "waf_v2" and sku != "standard_v2"
-| project recommendationId = "agw-4", name, id, param1 = "sku: v1"
-| order by id asc
-
diff --git a/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql b/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql
index 614a7f9ca..8c4dba18a 100644
--- a/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Application Gateways are not using health probes to monitor the availability of the backend systems
+resources
+| where type =~ "microsoft.network/applicationGateways"
+| where array_length(properties.probes) == 0
+| project recommendationId="agw-6", name, id, tags, param1="customHealthProbeUsed: false"
diff --git a/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql.fix b/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql.fix
deleted file mode 100644
index 25be1ff32..000000000
--- a/docs/content/services/networking/application-gateway/code/agw-6/agw-6.kql.fix
+++ /dev/null
@@ -1,11 +0,0 @@
-// Azure Resource Graph Query
-// You can use the following Azure Resource Graph query to check which App GWs are not using SSL certs
-//under development
-Resources
-| where type == "microsoft.network/applicationGateways"
-| extend appGatewayResourceId = tostring(id)
-| mvexpand probeConfig = properties.probes
-| where probeConfig.probeName != "GatewaySslCertificate"
-| where iif(isnotempty(probeConfig.pickHostName), "Yes", "No")
-| project recommendationId="agw-6",name, id, param1=strcat("appGatewayResourceId: ", appGatewayResourceId), param2=strcat("customHealthProbeUsed :", customHealthProbeUsed)
-
diff --git a/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql b/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql
index 614a7f9ca..46f1ceff1 100644
--- a/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph Query
+// list Application Gateways that are not configured to use at least 2 Availability Zones
+resources
+| where type =~ "microsoft.network/applicationGateways"
+| where isnull(zones) or array_length(zones) < 2
+| extend zoneValue = iff((isnull(zones)), "null", zones)
+| project recommendationId = "agw-7", name, id, tags, param1="Zones: No Zone or Zonal", param2=strcat("Zones value: ", zoneValue )
diff --git a/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql.fix b/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql.fix
deleted file mode 100644
index fbe5a14a3..000000000
--- a/docs/content/services/networking/application-gateway/code/agw-7/agw-7.kql.fix
+++ /dev/null
@@ -1,8 +0,0 @@
-// Azure Resource Graph Query
-// You can use the following Azure Resource Graph Query to see if the Application Gateway is zone redundant
-Resources
-| where type =~ "microsoft.network/applicationGateways"
-| extend appGatewayResourceId = tostring(id)
-| extend zoneRedundant = tostring(properties.enableZoneRedundancy)
-| project appGatewayResourceId, zoneRedundant
-
diff --git a/docs/content/services/networking/application-gateway/code/agw-9/agw-9.kql b/docs/content/services/networking/application-gateway/code/agw-9/agw-9.kql
index fa5cad258..819f8c14f 100644
--- a/docs/content/services/networking/application-gateway/code/agw-9/agw-9.kql
+++ b/docs/content/services/networking/application-gateway/code/agw-9/agw-9.kql
@@ -1 +1,14 @@
-// cannot-be-validated-with-arg
+// Azure Resource Graph Query
+// This query will validate the subnet id for an appGW ends with a /24
+
+resources
+| where type =~ 'Microsoft.Network/applicationGateways'
+| extend subnetid = tostring(properties.gatewayIPConfigurations[0].properties.subnet.id)
+| join kind=leftouter(resources
+ | where type == "microsoft.network/virtualnetworks"
+ | mv-expand properties.subnets
+ | extend subnetid = tostring(properties_subnets.id)
+ | extend addressprefix = tostring(properties_subnets.properties.addressPrefix)
+ | project subnetid, addressprefix) on subnetid
+| where addressprefix !endswith '/24'
+| project recommendationID = "agw-9", name, id, tags, param1 = strcat('AppGW subnet prefix: ', addressprefix)
diff --git a/docs/content/services/networking/ddos-protection-plans/_index.md b/docs/content/services/networking/ddos-protection-plans/_index.md
new file mode 100644
index 000000000..25c324409
--- /dev/null
+++ b/docs/content/services/networking/ddos-protection-plans/_index.md
@@ -0,0 +1,55 @@
++++
+title = "DDoS Protection Plans"
+description = "Best practices and resiliency recommendations for DDoS Protection Plans and associated resources and settings."
+date = "3/7/2024"
+author = "rodrigosantosms"
+msAuthor = "rodrigosantosms"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include DDoS Protection Plans and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+| :------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------------: | :------: | :------: | :-----------------: |
+| [DDOS-1 - Monitor Azure DDoS Protection Plan metrics](#ddos-1---monitor-azure-ddos-protection-plan-metrics) | Access & Security | Medium | Preview | No |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### DDOS-1 - Monitor Azure DDoS Protection Plan metrics
+
+**Category: Access & Security**
+
+**Impact: Medium**
+
+**Guidance**
+
+The metric names present different packet types, and bytes vs. packets, with a basic construct of tag names on each metric as follows:
+
+- Dropped tag name (for example, Inbound Packets Dropped DDoS): The number of packets dropped/scrubbed by the DDoS protection system.
+- Forwarded tag name (for example Inbound Packets Forwarded DDoS): The number of packets forwarded by the DDoS system to the destination VIP – traffic that wasn't filtered.
+- No tag name (for example Inbound Packets DDoS): The total number of packets that came into the scrubbing system – representing the sum of the packets dropped and forwarded.
+
+**Resources**
+
+- [Monitoring Azure DDoS Protection](https://learn.microsoft.com/en-us/azure/ddos-protection/monitor-ddos-protection-reference)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ddos-1/ddos-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/networking/ddos-protection-plans/code/ddos-1/ddos-1.kql b/docs/content/services/networking/ddos-protection-plans/code/ddos-1/ddos-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/ddos-protection-plans/code/ddos-1/ddos-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/expressroute-circuits/_index.md b/docs/content/services/networking/expressroute-circuits/_index.md
index fec5408e0..44722a657 100644
--- a/docs/content/services/networking/expressroute-circuits/_index.md
+++ b/docs/content/services/networking/expressroute-circuits/_index.md
@@ -1,30 +1,28 @@
+++
title = "ExpressRoute Circuits"
-description = "Best practices and resiliency recommendations for ExpressRoute Circuits and associated resources."
-date = "10/31/23"
+description = "Best practices and resiliency recommendations for ExpressRoute circuits and associated resources."
+date = "01/31/2024"
author = "ehaslett"
msAuthor = "ethaslet"
draft = false
+++
-The presented resiliency recommendations in this guidance include ExpressRoute Circuits and associated ExpressRoute Circuits settings.
+The presented resiliency recommendations in this guidance include ExpressRoute circuits and associated ExpressRoute circuit settings.
## Summary of Recommendations
-The below table shows the list of resiliency recommendations for ExpressRoute Circuits and associated resources.
+The below table shows the list of resiliency recommendations for ExpressRoute circuits and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :-----------------: |
-| [ERC-1 - Ensure both connections of an ExpressRoute circuit are configured and active](#erc-1---ensure-both-connections-of-an-expressroute-circuit-are-configured-and-active) | High | Preview | Yes |
-| [ERC-2 - Physical layer diversity](#erc-2---physical-layer-diversity) | High | Preview | No |
-| [ERC-3 - Diversify primary and secondary connections to customer end routers](#erc-3---diversify-primary-and-secondary-connections-to-customer-end-routers) | High | Preview | No |
-| [ERC-4 - Diversify primary and secondary connections to customer end ports](#erc-4---diversify-primary-and-secondary-connections-to-customer-end-ports) | High | Preview | No |
-| [ERC-5 - Monitor ExpressRoute using Azure Monitor](#erc-5---monitor-expressroute-using-azure-monitor) | Medium | Preview | No |
-| [ERC-6 - Configure service health to receive ExpressRoute circuit maintenance notification](#erc-6---configure-service-health-to-receive-expressroute-circuit-maintenance-notification) | Medium | Preview | No |
-| [ERC-7 - Ensure Bidirectional Forwarding Detection is enabled and configured on customer equipment](#erc-7---ensure-bidirectional-forwarding-detection-is-enabled-and-configured) | High | Preview | No |
-| [ERC-8 - Implement multiple geo-redundant ExpressRoute circuits](#erc-8---implement-multiple-geo-redundant-expressroute-circuits) | Medium | Preview | No |
-| [ERC-9 - Configure site-to-site VPN as a backup to ExpressRoute private peering](#erc-9---configure-site-to-site-vpn-as-a-backup-to-expressroute-private-peering) | Medium | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [ERC-1 - Connect your on-premises network to critical workloads in Azure through two or more ExpressRoute circuits in different peering locations](#erc-1---connect-your-on-premises-network-to-critical-workloads-in-azure-through-two-or-more-expressroute-circuits-in-different-peering-locations) | Availability | High | Verified | No |
+| [ERC-2 - Ensure the two physical links of your ExpressRoute circuit are connected to two distinct edge devices in your network](#erc-2---ensure-the-two-physical-links-of-your-expressroute-circuit-are-connected-to-two-distinct-edge-devices-in-your-network) | Availability | High | Verified | No |
+| [ERC-3 - Ensure both connections of an ExpressRoute circuit are configured in active-active mode](#erc-3---ensure-both-connections-of-an-expressroute-circuit-are-configured-in-active-active-mode) | Availability | High | Verified | Yes |
+| [ERC-4 - Ensure Bidirectional Forwarding Detection is enabled and configured on customer or provider edge routing devices](#erc-4---ensure-bidirectional-forwarding-detection-is-enabled-and-configured-on-customer-or-provider-edge-routing-devices) | Availability | High | Verified | No |
+| [ERC-5 - Configure monitoring and alerting for ExpressRoute circuits](#erc-5---configure-monitoring-and-alerting-for-expressroute-circuits) | Monitoring | Medium | Verified | No |
+| [ERC-6 - Configure service health to receive ExpressRoute circuit maintenance notification](#erc-6---configure-service-health-to-receive-expressroute-circuit-maintenance-notification) | Monitoring | Medium | Verified | No |
+| [ERC-7 - Use a site-to-site VPN as an interim backup solution for a single ExpressRoute circuit](#erc-7---use-a-site-to-site-vpn-as-an-interim-backup-solution-for-a-single-expressroute-circuit) | Disaster Recovery | Medium | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -35,19 +33,20 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
## Recommendations Details
-### ERC-1 - Ensure both connections of an ExpressRoute circuit are configured and active
+### ERC-1 - Connect your on-premises network to critical workloads in Azure through two or more ExpressRoute circuits in different peering locations
+
+**Category: Availability**
**Impact: High**
**Guidance**
-To improve high availability, it's recommended to operate both the connections of an ExpressRoute circuit in active-active mode. If you let the connections operate in active-active mode, Microsoft network will load balance the traffic across the connections on per-flow basis.
-
+Connect each ExpressRoute Gateway to a minimum of two circuits instantiated in different peering locations.
**Resources**
-- [Designing for high availability with ExpressRoute - Active-active connections](https://learn.microsoft.com/azure/expressroute/designing-for-high-availability-with-expressroute#active-active-connections)
+- [Designing for disaster recovery with ExpressRoute private peering](https://learn.microsoft.com/azure/expressroute/designing-for-disaster-recovery-with-expressroute-privatepeering)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -57,19 +56,22 @@ To improve high availability, it's recommended to operate both the connections o
-### ERC-2 - Physical layer diversity
+### ERC-2 - Ensure the two physical links of your ExpressRoute circuit are connected to two distinct edge devices in your network
+
+**Category: Availability**
**Impact: High**
**Guidance**
-For better resiliency, plan to have multiple paths between the on-premises edge and the peering locations (provider/Microsoft edge locations). This configuration can be achieved by going through different service provider or through a different location from the on-premises network.
+Microsoft (in the ExpressRoute direct model) or the ExpressRoute provider (in the ExpressRoute provider-based model) always offer a physically redundant service. Make sure that the same level of physical redundancy (two physical devices, two physical links) is used across the entire path from the ExpressRoute peering location to your network.
**Resources**
+- [Designing for high availability with ExpressRoute](https://learn.microsoft.com/en-us/azure/expressroute/designing-for-high-availability-with-expressroute)
- [Azure Well-Architected Framework review - Azure ExpressRoute - Design Checklist](https://learn.microsoft.com/azure/well-architected/services/networking/azure-expressroute#recommendations)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -79,19 +81,21 @@ For better resiliency, plan to have multiple paths between the on-premises edge
-### ERC-3 - Diversify primary and secondary connections to customer end routers
+### ERC-3 - Ensure both connections of an ExpressRoute circuit are configured in active-active mode
+
+**Category: Availability**
**Impact: High**
**Guidance**
-Never terminate primary and secondary connections on the same customer end router. This creates a single point of failure.
+To improve high availability, it's recommended that you operate both the connections of an ExpressRoute circuit in active-active mode. If you configure the connections to operate in active-active mode, the Microsoft network will load balance the traffic across the connections on a per-flow basis.
**Resources**
-- [Designing for high availability with ExpressRoute - First mile physical layer design considerations](https://learn.microsoft.com/azure/expressroute/designing-for-high-availability-with-expressroute#first-mile-physical-layer-design-considerations)
+- [Designing for high availability with ExpressRoute - Active-active connections](https://learn.microsoft.com/azure/expressroute/designing-for-high-availability-with-expressroute#active-active-connections)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -101,19 +105,21 @@ Never terminate primary and secondary connections on the same customer end route
-### ERC-4 - Diversify primary and secondary connections to customer end ports
+### ERC-4 - Ensure Bidirectional Forwarding Detection is enabled and configured on customer or provider edge routing devices
+
+**Category: Availability**
**Impact: High**
**Guidance**
-Don’t configure both Primary and secondary connections via same port. This creates a single point of failure.
+When you enable Bidirectional Forwarding Detection (BFD) over ExpressRoute, you can speed up the link failure detection between Microsoft Enterprise edge (MSEE) devices and the routers that your ExpressRoute circuit gets configured (CE/PE). You can configure ExpressRoute over your edge routing devices or your Partner Edge routing devices (if you went with managed Layer 3 connection service).
**Resources**
-- [Designing for high availability with ExpressRoute - First mile physical layer design considerations](https://learn.microsoft.com/azure/expressroute/designing-for-high-availability-with-expressroute#first-mile-physical-layer-design-considerations)
+- [Configure BFD over ExpressRoute](https://learn.microsoft.com/azure/expressroute/expressroute-bfd)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -123,19 +129,27 @@ Don’t configure both Primary and secondary connections via same port. This cre
-### ERC-5 - Monitor ExpressRoute using Azure Monitor
+### ERC-5 - Configure monitoring and alerting for ExpressRoute circuits
+
+**Category: Monitoring**
**Impact: Medium**
**Guidance**
-ExpressRoute monitor provides end-to-end monitoring capabilities including: Loss, latency, and performance from on-premises to Azure and Azure to on-premises
+Configure monitoring using Network Insights for ExpressRoute circuit availability, circuit QoS, and throughput. Configure alerts for availability metrics and circuit QoS metrics according to [ExpressRoute Circuits | Azure Monitor Baseline Alerts](https://azure.github.io/azure-monitor-baseline-alerts/services/Network/expressRouteCircuits/), and throughput metrics when bits/sec exceed a threshold appropriate for the ExpressRoute circuit SKU and customer usage.
+
+Configure alerts using Connection Monitor for ExpressRoute with a Log Analytics workspace, and Network Watcher. Configure alerts for when ChecksFailedPercent exceeds 5%, and when RoundTripTimeMs exceeds a pre-tested average appropriate to the environment.
+
+For ExpressRoute Direct, configure Traffic Collection for ExpressRoute Direct to send flow logs to a Log Analytics workspace.
**Resources**
+- [Azure ExpressRoute Insights using Network Insights | Microsoft Learn](https://learn.microsoft.com/en-us/azure/expressroute/expressroute-network-insights)
- [Monitoring Azure ExpressRoute](https://learn.microsoft.com/azure/expressroute/monitor-expressroute)
+- [Configure Traffic Collector for ExpressRoute Direct - Azure ExpressRoute | Microsoft Learn](https://learn.microsoft.com/en-us/azure/expressroute/how-to-configure-traffic-collector#deploy-expressroute-traffic-collector)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -147,6 +161,8 @@ ExpressRoute monitor provides end-to-end monitoring capabilities including: Loss
### ERC-6 - Configure service health to receive ExpressRoute circuit maintenance notification
+**Category: Monitoring**
+
**Impact: Medium**
**Guidance**
@@ -157,7 +173,7 @@ ExpressRoute uses service health to notify about planned and unplanned maintenan
- [How to view and configure alerts for Azure ExpressRoute circuit maintenance](https://learn.microsoft.com/azure/expressroute/maintenance-alerts)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -167,67 +183,25 @@ ExpressRoute uses service health to notify about planned and unplanned maintenan
-### ERC-7 - Ensure Bidirectional Forwarding Detection is enabled and configured
-
-**Impact: High**
-
-**Guidance**
-
-When you enable Bidirectional Forwarding Detection (BFD) over ExpressRoute, you can speed up the link failure detection between Microsoft Enterprise edge (MSEE) devices and the routers that your ExpressRoute circuit gets configured (CE/PE). You can configure ExpressRoute over your edge routing devices or your Partner Edge routing devices (if you went with managed Layer 3 connection service).
-
-**Resources**
-
-- [https://learn.microsoft.com/azure/expressroute/expressroute-bfd](https://learn.microsoft.com/azure/expressroute/expressroute-bfd)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/erc-7/erc-7.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### ERC-8 - Implement multiple geo-redundant ExpressRoute circuits
+### ERC-7 - Use a site-to-site VPN as an interim backup solution for a single ExpressRoute circuit
-**Impact: Medium**
-
-**Guidance**
-
-Implement multiple geo-redundant ExpressRoute circuits in your Virtual Network for cross premises resiliency
-
-**Resources**
-
-- [Designing for disaster recovery with ExpressRoute private peering](https://learn.microsoft.com/azure/expressroute/designing-for-disaster-recovery-with-expressroute-privatepeering)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/erc-8/erc-8.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### ERC-9 - Configure site-to-site VPN as a backup to ExpressRoute private peering
+**Category: Disaster Recovery**
**Impact: Medium**
**Guidance**
-Consider using site-to-site VPN as a failover when an ExpressRoute circuit becomes unavailable.
+If you have not yet added a second ExpressRoute circuit for an ExpressRoute Gateway, use a site-to-site VPN as an interim solution until the second ExpressRoute circuit is available.
**Resources**
- [Using S2S VPN as a backup for ExpressRoute private peering](https://learn.microsoft.com/azure/expressroute/use-s2s-vpn-as-backup-for-expressroute-privatepeering)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/erc-9/erc-9.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/erc-7/erc-7.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/networking/expressroute-circuits/code/erc-1/erc-1.kql b/docs/content/services/networking/expressroute-circuits/code/erc-1/erc-1.kql
index da9f40aa9..e1797f9d6 100644
--- a/docs/content/services/networking/expressroute-circuits/code/erc-1/erc-1.kql
+++ b/docs/content/services/networking/expressroute-circuits/code/erc-1/erc-1.kql
@@ -1,8 +1 @@
-// Azure Resource Graph Query
-// Goal: Show any ExpressRoute circuit where one of the connections is not configured (i.e. no IP)
-Resources
-| where type =~ 'Microsoft.Network/expressRouteCircuits'
-| where properties.value[0].provisioningState != 'Succeeded' or properties.value[1].provisioningState != 'Succeeded'
-| where not(properties.peerings[0].properties.primaryPeerAddressPrefix != "null" and properties.peerings[0].properties.secondaryPeerAddressPrefix != "null")
-| project recommendationId = "erc-1", name, id, tags, param1 = strcat("Peer1_IP: ",properties.peerings[0].properties.primaryPeerAddressPrefix), param2=strcat("Peer2_IP: ", properties.peerings[0].properties.secondaryPeerAddressPrefix)
-| order by id asc
+// under-development
diff --git a/docs/content/services/networking/expressroute-circuits/code/erc-3/erc-3.kql b/docs/content/services/networking/expressroute-circuits/code/erc-3/erc-3.kql
index 0b5bdc8a0..a4d33e544 100644
--- a/docs/content/services/networking/expressroute-circuits/code/erc-3/erc-3.kql
+++ b/docs/content/services/networking/expressroute-circuits/code/erc-3/erc-3.kql
@@ -1 +1,8 @@
-// cannot-be-validated-with-arg
+// Azure Resource Graph Query
+// Goal: Show any ExpressRoute circuit where one of the connections is not configured (i.e. no IP)
+Resources
+| where type =~ 'Microsoft.Network/expressRouteCircuits'
+| where properties.value[0].provisioningState != 'Succeeded' or properties.value[1].provisioningState != 'Succeeded'
+| where not(properties.peerings[0].properties.primaryPeerAddressPrefix != "null" and properties.peerings[0].properties.secondaryPeerAddressPrefix != "null")
+| project recommendationId = "erc-3", name, id, tags, param1 = strcat("Peer1_IP: ",properties.peerings[0].properties.primaryPeerAddressPrefix), param2=strcat("Peer2_IP: ", properties.peerings[0].properties.secondaryPeerAddressPrefix)
+| order by id asc
diff --git a/docs/content/services/networking/expressroute-circuits/code/erc-7/erc-7.kql b/docs/content/services/networking/expressroute-circuits/code/erc-7/erc-7.kql
index 0b5bdc8a0..e1797f9d6 100644
--- a/docs/content/services/networking/expressroute-circuits/code/erc-7/erc-7.kql
+++ b/docs/content/services/networking/expressroute-circuits/code/erc-7/erc-7.kql
@@ -1 +1 @@
-// cannot-be-validated-with-arg
+// under-development
diff --git a/docs/content/services/networking/expressroute-circuits/code/erc-8/erc-8.kql b/docs/content/services/networking/expressroute-circuits/code/erc-8/erc-8.kql
deleted file mode 100644
index e1797f9d6..000000000
--- a/docs/content/services/networking/expressroute-circuits/code/erc-8/erc-8.kql
+++ /dev/null
@@ -1 +0,0 @@
-// under-development
diff --git a/docs/content/services/networking/expressroute-circuits/code/erc-9/erc-9.kql b/docs/content/services/networking/expressroute-circuits/code/erc-9/erc-9.kql
deleted file mode 100644
index e1797f9d6..000000000
--- a/docs/content/services/networking/expressroute-circuits/code/erc-9/erc-9.kql
+++ /dev/null
@@ -1 +0,0 @@
-// under-development
diff --git a/docs/content/services/networking/expressroute-connection/_index.md b/docs/content/services/networking/expressroute-connection/_index.md
new file mode 100644
index 000000000..c8fc30ec0
--- /dev/null
+++ b/docs/content/services/networking/expressroute-connection/_index.md
@@ -0,0 +1,75 @@
++++
+title = "ExpressRoute Connection"
+description = "Best practices and resiliency recommendations for ExpressRoute Connection and associated resources and settings."
+date = "1/28/24"
+author = "ehaslett"
+msAuthor = "ethaslet"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include ExpressRoute Connection and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
+| [ERCON-1 - For Connections using ExpressRoute Direct circuits and UltraPerformance or ErGw3AZ ExpressRoute Gateways, enable FastPath to improve data path performance between your on-premises network and your virtual network](#ercon-1---for-connections-using-expressroute-direct-circuits-and-ultraperformance-or-ergw3az-expressroute-gateways-enable-fastpath-to-improve-data-path-performance-between-your-on-premises-network-and-your-virtual-network) | System Efficiency | Medium | Verified | No |
+| [ERCON-2 - Configure an Azure Resource Lock on connections to prevent accidental deletion](#ercon-2---configure-an-azure-resource-lock-on-connections-to-prevent-accidental-deletion) | Availability | High | Verified | No |
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### ERCON-1 - For Connections using ExpressRoute Direct circuits and UltraPerformance or ErGw3AZ ExpressRoute Gateways, enable FastPath to improve data path performance between your on-premises network and your virtual network
+
+**Category: System Efficiency**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+ExpressRoute virtual network gateway is designed to exchange network routes and route network traffic. FastPath is designed to improve the data path performance between your on-premises network and your virtual network. When enabled, FastPath sends network traffic directly to virtual machines in the virtual network, bypassing the gateway. Bypassing the gateway enhances resiliency by reducing its utilization of the gateway.
+
+**Resources**
+
+- [About ExpressRoute FastPath](https://learn.microsoft.com/en-us/azure/expressroute/about-fastpath)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ercon-1/ercon-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ERCON-2 - Configure an Azure Resource Lock on connections to prevent accidental deletion
+
+**Category: Availability**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Configure an Azure Resource lock for Gateway Connection resources to prevent accidental deletion. Accidental deletion of a Gateway Connection resource may result in unexpected loss of connectivity between your on-premises network and Azure workloads. As an administrator, you can lock an Azure subscription, resource group, or resource to protect them from accidental user deletions and modifications. The lock overrides any user permission.
+
+**Resources**
+
+- [Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources?tabs=json)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ercon-2/ercon-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/networking/expressroute-connection/code/ercon-1/ercon-1.kql b/docs/content/services/networking/expressroute-connection/code/ercon-1/ercon-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/expressroute-connection/code/ercon-1/ercon-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/expressroute-connection/code/ercon-2/ercon-2.kql b/docs/content/services/networking/expressroute-connection/code/ercon-2/ercon-2.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/expressroute-connection/code/ercon-2/ercon-2.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/expressroute-direct/_index.md b/docs/content/services/networking/expressroute-direct/_index.md
new file mode 100644
index 000000000..f4231f25d
--- /dev/null
+++ b/docs/content/services/networking/expressroute-direct/_index.md
@@ -0,0 +1,100 @@
++++
+title = "ExpressRoute Direct"
+description = "Best practices and resiliency recommendations for ExpressRoute Direct and associated resources and settings."
+date = "1/28/24"
+author = "ehaslett"
+msAuthor = "ethaslet"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include ExpressRoute Direct and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
+| [ERD-1 - The Admin State of both Links of an ExpressRoute Direct should be in Enabled state](#erd-1---the-admin-state-of-both-links-of-an-expressroute-direct-should-be-in-enabled-state) | Availability | High | Verified | No |
+| [ERD-2 - Ensure you do not over-subscribe an ExpressRoute Direct](#erd-2---ensure-you-do-not-over-subscribe-an-expressroute-direct) | System Efficiency | High | Verified | No |
+| [ERD-3 - Enable rate-limiting to help optimize network performance by controlling the traffic volume across all your ExpressRoute Direct based circuits - In Preview](#erd-3---enable-rate-limiting-to-help-optimize-network-performance-by-controlling-the-traffic-volume-across-all-your-expressroute-direct-based-circuits---in-preview) | System Efficiency | Medium | Verified| No |
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### ERD-1 - The Admin State of both Links of an ExpressRoute Direct should be in Enabled state
+
+**Category: Availability**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+In Azure ExpressRoute Direct, the "Admin State" refers to the administrative status of the ExpressRoute layer 1 links. It essentially indicates whether a particular link is enabled or disabled, in other words the physical port is on or off; and is required to pass traffic across the ExpressRoute Direct connection. Admin State is a crucial setting because it determines the operational status of your ExpressRoute Direct, affecting connectivity between your on-premises network and Azure services.
+
+**Resources**
+
+- [How to configure ExpressRoute Direct: Change Admin State of links](https://learn.microsoft.com/en-us/azure/expressroute/expressroute-howto-erdirect#state)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/erd-1/erd-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ERD-2 - Ensure you do not over-subscribe an ExpressRoute Direct
+
+**Category: System Efficiency**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+You can provision logical ExpressRoute circuits on top of your selected ExpressRoute Direct resource of 10-Gbps or 100-Gbps up to the subscribed Bandwidth of 20-Gbps or 200-Gbps. From a resiliency perspective this is not recommended. If one of the ExpressRoute Direct ports goes down, and your ExpressRoute circuits are already consuming 100% of the 10-Gbps or 100-Gbps, the second ExpressRoute Direct port wouldn’t have bandwidth enough to support any additional load. One reason a port may be down would be during a maintenance event. The remaining port would support all traffic during the maintenance event, up to the 10-Gbps or 100-Gbps capacity. Unless you use rate limiting for ExpressRoute Direct circuits (Preview) to limit the bandwidth of non-production connections, you should not over-subscribe your ExpressRoute Direct ports being used for production workloads.
+
+**Resources**
+
+- [About ExpressRoute Direct: Circuit Sizes](https://learn.microsoft.com/en-us/azure/expressroute/expressroute-erdirect-about?source=recommendations#circuit-sizes)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/erd-2/erd-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ERD-3 - Enable rate-limiting to help optimize network performance by controlling the traffic volume across all your ExpressRoute Direct based Circuits - In Preview
+
+**Category: System Efficiency**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Rate limiting is a feature that enables you to control the traffic volume between your on-premises network and Azure over an ExpressRoute Direct circuit. It applies to the traffic over either private or Microsoft peering of the ExpressRoute circuit. This feature helps distribute the port bandwidth evenly among the circuits, ensures network stability, and prevents network congestion. This document outlines the steps to enable rate limiting for your ExpressRoute Direct circuits.
+
+**Resources**
+
+- [Rate limiting for ExpressRoute Direct circuits (Preview)](https://learn.microsoft.com/en-us/azure/expressroute/rate-limit)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/erd-3/erd-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/networking/expressroute-direct/code/erd-1/erd-1.kql b/docs/content/services/networking/expressroute-direct/code/erd-1/erd-1.kql
new file mode 100644
index 000000000..aa3c82dfe
--- /dev/null
+++ b/docs/content/services/networking/expressroute-direct/code/erd-1/erd-1.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Find all Express Route Directs that do not have Admin State of both Links Enabled
+resources
+| where type == "microsoft.network/expressrouteports"
+| where properties['links'][0]['properties']['adminState'] == "Disabled" or properties['links'][1]['properties']['adminState'] == "Disabled"
+| project recommendationId = "erd-1", name, id, tags, param1 = strcat("Link1AdminState: ", properties['links'][0]['properties']['adminState']), param2 = strcat("Link2AdminState: ", properties['links'][1]['properties']['adminState'])
diff --git a/docs/content/services/networking/expressroute-direct/code/erd-2/erd-2.kql b/docs/content/services/networking/expressroute-direct/code/erd-2/erd-2.kql
new file mode 100644
index 000000000..471ef06f4
--- /dev/null
+++ b/docs/content/services/networking/expressroute-direct/code/erd-2/erd-2.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph Query
+// Find all Express Route Directs that are over subscribed
+resources
+| where type == "microsoft.network/expressrouteports"
+| where toint(properties['provisionedBandwidthInGbps']) > toint(properties['bandwidthInGbps'])
+| project recommendationId = "erd-2", name, id, tags, param1 = strcat("provisionedBandwidthInGbps: ", properties['provisionedBandwidthInGbps']), param2 = strcat("bandwidthInGbps: ", properties['bandwidthInGbps'])
diff --git a/docs/content/services/networking/expressroute-direct/code/erd-3/erd-3.kql b/docs/content/services/networking/expressroute-direct/code/erd-3/erd-3.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/expressroute-direct/code/erd-3/erd-3.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/expressroute-gateway/_index.md b/docs/content/services/networking/expressroute-gateway/_index.md
index 6f104001e..2e4dfcddc 100644
--- a/docs/content/services/networking/expressroute-gateway/_index.md
+++ b/docs/content/services/networking/expressroute-gateway/_index.md
@@ -1,7 +1,7 @@
+++
title = "ExpressRoute Gateway"
description = "Best practices and resiliency recommendations for ExpressRoute Gateway and associated resources."
-date = "01/11/24"
+date = "01/31/24"
author = "ehaslett"
msAuthor = "ethaslet"
draft = false
@@ -12,12 +12,14 @@ The presented resiliency recommendations in this guidance include ExpressRoute G
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :----------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :-----------------: |
-| [ERGW-1 - Use Zone-redundant gateway SKUs](#ergw-1---use-zone-redundant-gateway-skus) | High | Preview | Yes |
-| [ERGW-2 - Monitor gateway health](#ergw-2---monitor-gateway-health) | High | Preview | No |
-| [ERGW-3 - Use Vnet peering for Vnet to Vnet connectivity instead of ExpressRoute circuits](#ergw-3---use-vnet-peering-for-vnet-to-vnet-connectivity-instead-of-expressroute-circuits) | Medium | Preview | No |
-
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------:|:------:|:-------:|:-------------------:|
+| [ERGW-1 - Connect the ExpressRoute Gateway to two or more circuits from different peering locations for higher resiliency](#ergw-1---connect-the-expressroute-gateway-to-two-or-more-circuits-from-different-peering-locations-for-higher-resiliency) | Availability | High | Verified | No |
+| [ERGW-2 - Use Zone-redundant gateway SKUs](#ergw-2---use-zone-redundant-gateway-skus) | Availability | High | Verified | Yes |
+| [ERGW-3 - Configure an Azure Resource lock for ExpressRoute Gateway to prevent accidental deletion](#ergw-3---configure-an-azure-resource-lock-for-expressroute-gateway-to-prevent-accidental-deletion) | Availability | Medium | Verified | No |
+| [ERGW-4 - Monitor gateway health](#ergw-4---monitor-gateway-health) | Monitoring | High | Verified | No |
+| [ERGW-6 - Avoid using ExpressRoute circuits for VNet to VNet communication](#ergw-6---avoid-using-expressroute-circuits-for-vnet-to-vnet-communication) | Networking | Medium | Verified | No |
+| [ERGW-7 - Configure customer-controlled gateway maintenance - In Preview](#ergw-7---configure-customer-controlled-gateway-maintenance---in-preview) | Networking | High | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -28,7 +30,31 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
## Recommendations Details
-### ERGW-1 - Use Zone-redundant gateway SKUs
+### ERGW-1 - Connect the ExpressRoute Gateway to two or more circuits from different peering locations for higher resiliency
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Connect each ExpressRoute Gateway to a minimum of two circuits, with each circuit connecting from a diverse peering location compared to the other.
+
+**Resources**
+
+- [Designing for disaster recovery with ExpressRoute private peering](https://learn.microsoft.com/azure/expressroute/designing-for-disaster-recovery-with-expressroute-privatepeering)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ergw-1/ergw-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ERGW-2 - Use Zone-redundant gateway SKUs
**Category: Availability**
@@ -36,7 +62,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
**Guidance**
-Azure ExpressRoute gateway provides different SLAs when it’s deployed in a single availability zone and when it’s deployed in two or more availability zones. For information about all Azure SLAs, see SLA summary for Azure services. To automatically deploy your virtual network gateways across availability zones, you can use zone-redundant virtual network gateways. With zone-redundant gateways, you can benefit from zone-resiliency to access your mission-critical, scalable services on Azure
+Azure ExpressRoute gateway provides different SLAs when it’s deployed in a single availability zone and when it’s deployed in two or more availability zones. For information about all Azure SLAs, see [Service Level Agreements (SLA) for Online Services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1&year=2023). To automatically deploy your virtual network gateways across availability zones, you can use zone-redundant virtual network gateways. With zone-redundant gateways, you can benefit from zone-resiliency to access your mission-critical, scalable services on Azure
**Resources**
@@ -44,17 +70,41 @@ Azure ExpressRoute gateway provides different SLAs when it’s deployed in a sin
- [About zone-redundant virtual network gateway in Azure availability zones](https://learn.microsoft.com/azure/vpn-gateway/about-zone-redundant-vnet-gateways)
- [Create a zone-redundant virtual network gateway in Azure Availability Zones](https://learn.microsoft.com/azure/vpn-gateway/create-zone-redundant-vnet-gateway)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/ergw-1/ergw-1.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/ergw-2/ergw-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ERGW-3 - Configure an Azure Resource lock for ExpressRoute Gateway to prevent accidental deletion
+
+**Category: Availability**
+
+**Impact: Medium**
+
+**Guidance**
+
+Configure an Azure Resource lock for ExpressRoute Gateway to prevent accidental deletion. As an administrator, you can lock an Azure subscription, resource group, or resource to protect them from accidental user deletions and modifications. The lock overrides any user permission.
+
+**Resources**
+
+- [Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources?tabs=json)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ergw-3/ergw-3.kql" >}} {{< /code >}}
{{< /collapse >}}
-### ERGW-2 - Monitor gateway health
+### ERGW-4 - Monitor gateway health
**Category: Monitoring**
@@ -62,24 +112,30 @@ Azure ExpressRoute gateway provides different SLAs when it’s deployed in a sin
**Guidance**
-Set up monitoring and alerts for Virtual Network Gateway health based on various metrics available.
+Setup monitoring using Network Insights for ExpressRoute Gateway availability, performance, and scalability.
+
+Configure alerts for availability metrics for routes advertised, routes learned and number of VMs based on the supported amounts for the ExpressRoute Gateway SKU in use. Configure alerts for frequency of routes changed based on the customer environment.
+
+Configure alerts for performance metrics for bits in, bits out and CPU utilization according to [ExpressRoute Gateways | Azure Monitor Baseline Alerts](https://azure.github.io/azure-monitor-baseline-alerts/services/Network/expressRouteGateways/). Configure alerts for packets per second based on the supported amount for the ExpressRoute Gateway SKU in use and based on the customer environment.
+
+Configure alerts for scalability metrics for active flows based on the supported amounts for the ExpressRoute Gateway SKU in use and the expected number of flows for the customer environment, and for max flows per second for when this value exceeds a historical baseline for the customer environment.
**Resources**
-- [Alerts for ExpressRoute gateway connections](https://learn.microsoft.com/azure/expressroute/monitor-expressroute#alerts-for-expressroute-gateway-connections)
-- [Gateway Metrics](https://learn.microsoft.com/azure/expressroute/expressroute-network-insights#gateway-metrics)
+- [ExpressRoute monitoring, metrics, and alerts | ExpressRoute gateways](https://learn.microsoft.com/azure/expressroute/expressroute-monitoring-metrics-alerts#expressroute-gateways)
+- [Azure ExpressRoute Insights using Network Insights](https://learn.microsoft.com/en-us/azure/expressroute/expressroute-network-insights)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/ergw-2/ergw-2.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/ergw-4/ergw-4.kql" >}} {{< /code >}}
{{< /collapse >}}
-### ERGW-3 - Use Vnet peering for Vnet to Vnet connectivity instead of ExpressRoute circuits
+### ERGW-6 - Avoid using ExpressRoute circuits for VNet to VNet communication
**Category: Networking**
@@ -87,17 +143,43 @@ Set up monitoring and alerts for Virtual Network Gateway health based on various
**Guidance**
-By default, connectivity between virtual networks are enabled when you link multiple virtual networks to the same ExpressRoute circuit. However, Microsoft advises against using your ExpressRoute circuit for communication between virtual networks and instead uses VNet peering. For more information about why VNet-to-VNet connectivity isn't recommended over ExpressRoute, see connectivity between virtual networks over ExpressRoute.
+By default, connectivity between virtual networks is enabled when you link multiple virtual networks, each with an ExpressRoute Gateway, to the same ExpressRoute circuit. However, Microsoft advises against using your ExpressRoute circuit for communication between virtual networks and instead use other techniques such as VNet peering, routing in a VNet hub via Azure Firewall, NVA and/or Azure Route Server, site-to-site VPN within Azure, the use of virtual WAN, or the use of SD-WAN.
+
+For more information about why VNet-to-VNet connectivity isn’t recommended over ExpressRoute, see: [Connectivity between virtual networks over ExpressRoute | Microsoft Learn](https://learn.microsoft.com/azure/expressroute/virtual-network-connectivity-guidance)
**Resources**
- [About ExpressRoute virtual network gateways - VNet-to-VNet connectivity](https://learn.microsoft.com/azure/expressroute/expressroute-about-virtual-network-gateways#vnet-to-vnet-connectivity)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/ergw-3/ergw-3.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/ergw-6/ergw-6.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ERGW-7 - Configure customer-controlled gateway maintenance - In Preview
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+ExpressRoute virtual network gateways undergo regular updates to enhance functionality, reliability, performance, and security. Configuring and scheduling customer-controlled maintenance will minimize the impact of these updates and align the update schedule to best fit your maintenance windows.
+
+**Resources**
+
+- [Configure customer-controlled maintenance for your virtual network gateway - ExpressRoute | Microsoft Learn](https://learn.microsoft.com/en-us/azure/expressroute/customer-controlled-gateway-maintenance#azure-portal-steps)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ergw-7/ergw-7.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-1/ergw-1.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-1/ergw-1.kql
index 7a687aa5d..16f5d2c3b 100644
--- a/docs/content/services/networking/expressroute-gateway/code/ergw-1/ergw-1.kql
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-1/ergw-1.kql
@@ -1,8 +1,56 @@
-// Azure Resource Graph Query
-// For all VNGs of type ExpressRoute, show any that do not have AZ in the SKU tier
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not connected to two or more ExpressRoute Circuits. Baremetal circuits are excluded from consideration
+//This query assumes that the running entity has visibilty to the gateway, connection, and circuit scopes.
+//Start with a full list of gateways
+(resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend exrGatewayId = tolower(tostring(id))
+| join kind=inner(
resources
-| where type =~ "Microsoft.Network/virtualNetworkGateways"
+| where type == "microsoft.network/virtualnetworkgateways"
| where properties.gatewayType == "ExpressRoute"
-| where properties.sku.tier !contains 'AZ'
-| project recommendationId = "ergw-1", name, id, tags, param1= strcat("sku-tier: " , properties.sku.tier), param2=location
-| order by id asc
+| extend exrGatewayId = tolower(tostring(id))
+| join kind=leftouter(
+//connections joined with circuit peer info
+resources
+| where type == "microsoft.network/connections"
+| extend connectionType = properties.connectionType
+| extend exrGatewayId = tolower(tostring(properties.virtualNetworkGateway1.id))
+| extend peerId = tolower(tostring(properties.peer.id))
+| extend connectionId = tolower(tostring(id))
+| where connectionType == "ExpressRoute"
+| join kind=leftouter(
+ resources
+ | where type == "microsoft.network/expressroutecircuits"
+ //should this be location instead of peeringLocation
+ | extend circuitId = tolower(tostring(id))
+ | extend peeringLocation = tostring(properties.serviceProviderProperties.peeringLocation)
+ | extend peerId = tolower(id)
+) on peerId ) on exrGatewayId
+//remove bare metal services connections/circuits
+| where not(isnotnull(connectionId) and isnull(sku1))
+//group by gateway ID's and peering locations
+| summarize by exrGatewayId, peeringLocation
+//summarize to connections with fewer than two unique connections
+| summarize connCount = count() by exrGatewayId
+| where connCount < 2) on exrGatewayId
+| project recommendationId = "ergw-1", name, id, tags, param1 = "twoOrMoreCircuitsConnectedFromDifferentPeeringLocations: false")
+| union
+(
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend exrGatewayId = tolower(tostring(id))
+| join kind=leftouter(
+//connections joined with circuit peer info
+resources
+| where type == "microsoft.network/connections"
+| extend connectionType = properties.connectionType
+| extend exrGatewayId = tolower(tostring(properties.virtualNetworkGateway1.id))
+| extend peerId = tolower(tostring(properties.peer.id))
+| extend connectionId = tolower(tostring(id))
+| where connectionType == "ExpressRoute") on exrGatewayId
+| where isnull(connectionType)
+| project recommendationId = "ergw-1", name, id, tags, param1 = "twoOrMoreCircuitsConnectedFromDifferentPeeringLocations: false", param2 = "noConnectionsOnGateway: true"
+)
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-2/ergw-2.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-2/ergw-2.kql
index 614a7f9ca..89e2a329d 100644
--- a/docs/content/services/networking/expressroute-gateway/code/ergw-2/ergw-2.kql
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-2/ergw-2.kql
@@ -1 +1,8 @@
-// under-development
+// Azure Resource Graph Query
+// For all VNGs of type ExpressRoute, show any that do not have AZ in the SKU tier
+resources
+| where type =~ "Microsoft.Network/virtualNetworkGateways"
+| where properties.gatewayType == "ExpressRoute"
+| where properties.sku.tier !contains 'AZ'
+| project recommendationId = "ergw-2", name, id, tags, param1= strcat("sku-tier: " , properties.sku.tier), param2=location
+| order by id asc
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.1.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.1.kql
new file mode 100644
index 000000000..b1a49506e
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.1.kql
@@ -0,0 +1,31 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Maximum number of routes advertised by to peer limit on the gateway.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: Operator: GreaterThanOrEqual, metric: ExpressRouteGatewayCountOfRoutesAdvertisedToPeer, timeAggregation: Maximum, Threshold 500
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| project gatewayId, name, id, tags, skuName
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+| where timeAggregation == 'Maximum'
+| where metric == 'ExpressRouteGatewayCountOfRoutesAdvertisedToPeer'
+| where operator == 'GreaterThanOrEqual'
+| where threshold == 500
+) on gatewayId
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Availability', param3 = 'ExpressRouteGatewayCountOfRoutesAdvertisedToPeer at or exceeding sku limit'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.2.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.2.kql
new file mode 100644
index 000000000..d1c4b8513
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.2.kql
@@ -0,0 +1,39 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Maximum number of routes learned from peer limit on the gateway based on the gateway sku.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: metric: ExpressRouteGatewayCountOfRoutesLearnedFromPeer, timeAggregation: Maximum, Operator: GreaterThanOrEqual, Threshold 4000 for Standard, ERGW1AZ, and ERGWScale sku's, 9500 for all others
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| extend routesLearnedPerSku = case(skuName == 'Standard', 4000,
+ skuName == 'ErGw1AZ', 4000,
+ skuName == 'HighPerformance', 9500,
+ skuName == 'ErGw2AZ', 9500,
+ skuName == 'UltraPerformance', 9500,
+ skuName == 'ErGw3AZ', 9500,
+ skuName == 'ErGwScale', 4000,
+ 4000)
+| project gatewayId, name, id, tags, skuName, routesLearnedPerSku
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+| where timeAggregation == 'Maximum'
+| where metric == 'ExpressRouteGatewayCountOfRoutesLearnedFromPeer'
+| where operator == 'GreaterThanOrEqual'
+) on gatewayId
+| where threshold != routesLearnedPerSku
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Availability', param3 = 'ExpressRouteGatewayCountOfRoutesLearnedFromPeer at or exceeding sku limit'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.3.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.3.kql
new file mode 100644
index 000000000..339261baf
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.3.kql
@@ -0,0 +1,35 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Average CPU usage to ensure it stays below a critical threshold per AMBA standard.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: Operator: GreaterThan, metric: ExpressRouteGatewayCpuUtilization, timeAggregation: Average, Threshold: 80, Severity: 1
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| project gatewayId, name, id, tags, skuName
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| extend severity = alertProperties.severity
+| where alertProperties.evaluationFrequency == 'PT1M'
+| where alertProperties.windowSize == 'PT5M'
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+| where timeAggregation == 'Average'
+| where metric == 'ExpressRouteGatewayCpuUtilization'
+| where operator == 'GreaterThan'
+| where threshold == 80
+| where severity == 1
+) on gatewayId
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Performance', param3 = 'ExpressRouteGatewayCpuUtilization exceeding 80% average CPU'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.4.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.4.kql
new file mode 100644
index 000000000..c57683292
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.4.kql
@@ -0,0 +1,35 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Average Gateway Bits received per second metric to ensure that traffic is still being received by the gateway.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: Metric: ExpressRouteGatewayBitsPerSecond, timeAggregation: Average, Operator: LessThan, Threshold: 1, Severity: 0
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| project gatewayId, name, id, tags, skuName
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| extend severity = alertProperties.severity
+| where alertProperties.evaluationFrequency == 'PT1M'
+| where alertProperties.windowSize == 'PT5M'
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+| where timeAggregation == 'Average'
+| where metric == 'ExpressRouteGatewayBitsPerSecond'
+| where operator == 'LessThan'
+| where threshold == 1
+| where severity == 0
+) on gatewayId
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Performance', param3 = 'ExpressRouteGatewayBitsPerSecond less than 1 bit per second'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.5.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.5.kql
new file mode 100644
index 000000000..ed88358db
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.5.kql
@@ -0,0 +1,44 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Average Packets per Second to ensure it stays below a critical threshold per AMBA standard.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: Operator: GreaterThan, metric: ExpressRouteGatewayPacketsPerSecond, timeAggregation: Average, Threshold: , Severity: 0
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| extend packetsPerSecondSku = case(skuName == 'Standard', 100000,
+ skuName == 'ErGw1AZ', 100000,
+ skuName == 'HighPerformance', 200000,
+ skuName == 'ErGw2AZ', 200000,
+ skuName == 'UltraPerformance', 1000000,
+ skuName == 'ErGw3AZ', 1000000,
+ skuName == 'ErGwScale', 100000,
+ 100000)
+| project gatewayId, name, id, tags, skuName, packetsPerSecondSku
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| extend severity = alertProperties.severity
+// temporarily commenting out more prescriptive configuration until we update AMBA recommendations
+//| where alertProperties.evaluationFrequency == 'PT1M'
+//| where alertProperties.windowSize == 'PT5M'
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+//| where timeAggregation == 'Average'
+| where metric == 'ExpressRouteGatewayPacketsPerSecond'
+//| where operator == 'GreaterThan'
+//| where severity == 0
+) on gatewayId
+//| where threshold != packetsPerSecondSku
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Performance', param3 = 'ExpressRouteGatewayPacketsPerSecond greater than sku maximum'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.6.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.6.kql
new file mode 100644
index 000000000..b5fee4851
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.6.kql
@@ -0,0 +1,39 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Maximum number of flow limit on the gateway based on the gateway sku.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: metric: ExpressRouteGatewayActiveFlows, timeAggregation: Maximum, Operator: GreaterThanOrEqual, Threshold 100000 for Standard, ERGW1AZ, 200000 for High Performance and ERGW2aZ, and 1000000 for Ultra and ERGW3AZ.
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| extend flowsPerSku = case(skuName == 'Standard', 100000,
+ skuName == 'ErGw1AZ', 100000,
+ skuName == 'HighPerformance', 200000,
+ skuName == 'ErGw2AZ', 200000,
+ skuName == 'UltraPerformance', 1000000,
+ skuName == 'ErGw3AZ', 1000000,
+ skuName == 'ErGwScale', 100000,
+ 100000)
+| project gatewayId, name, id, tags, skuName, flowsPerSku
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+| where timeAggregation == 'Maximum'
+| where metric == 'ExpressRouteGatewayActiveFlows'
+| where operator == 'GreaterThanOrEqual'
+) on gatewayId
+| where threshold != flowsPerSku
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Scalability', param3 = 'ExpressRouteGatewayActiveFlows at or exceeding sku limit'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.7.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.7.kql
new file mode 100644
index 000000000..546507b4b
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.7.kql
@@ -0,0 +1,39 @@
+// Azure Resource Graph Query
+// Provides a list of ExpressRoute Gateways that are not currently monitoring the Maximum number of virtual machines in the vnet and spokes where the gateway resides based on the gateway sku.
+//To remediate this finding, create an alert with the following configuration on each failing gateway: metric: ExpressRouteGatewayNumberOfVmInVnet, timeAggregation: Maximum, Operator: GreaterThanOrEqual, Threshold 2000 for Standard, ERGW1AZ, and ERGWScale sku's, 4500 for High Performance and ERGW2AZ sku's, and 11000 for all others
+resources
+| where type == "microsoft.network/virtualnetworkgateways"
+| where properties.gatewayType == "ExpressRoute"
+| extend gatewayId = tolower(tostring(id))
+| extend skuName = properties.sku.name
+| extend maxVMsPerSku = case(skuName == 'Standard', 2000,
+ skuName == 'ErGw1AZ', 2000,
+ skuName == 'HighPerformance', 4500,
+ skuName == 'ErGw2AZ', 4500,
+ skuName == 'UltraPerformance', 11000,
+ skuName == 'ErGw3AZ', 11000,
+ skuName == 'ErGwScale', 2000,
+ 2000)
+| project gatewayId, name, id, tags, skuName, maxVMsPerSku
+| join kind=leftouter(
+resources
+| where type == 'microsoft.insights/metricalerts'
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| where alertProperties.enabled == true
+| extend gatewayId = tolower(tostring(alertProperties_scopes))
+| extend criterionType = alertProperties_criteria_allOf.criterionType
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend metricNamespace = alertProperties_criteria_allOf.metricNamespace
+| extend operator = alertProperties_criteria_allOf.operator
+| extend threshold = alertProperties_criteria_allOf.threshold
+| extend timeAggregation = alertProperties_criteria_allOf.timeAggregation
+| where metricNamespace == 'Microsoft.Network/virtualNetworkGateways'
+| where timeAggregation == 'Maximum'
+| where metric == 'ExpressRouteGatewayNumberOfVmInVnet'
+| where operator == 'GreaterThanOrEqual'
+) on gatewayId
+| where threshold != maxVMsPerSku
+| where isnull(threshold)
+| project recommendationId = 'ergw-4', name, id, tags, param1 = 'monitorExpressRouteGatewayHealth', param2 = 'ExpressRouteGatewayHealthCategory:Availability', param3 = 'ExpressRouteGatewayNumberOfVmInVnet at or exceeding sku limit'
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.kql
new file mode 100644
index 000000000..63d1115eb
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-4/ergw-4.kql
@@ -0,0 +1,3 @@
+//The KQL files for this test are distributed into 7 different files.
+//Make sure and run all 7 for complete coverage
+.
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-6/ergw-6.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-6/ergw-6.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-6/ergw-6.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/expressroute-gateway/code/ergw-7/ergw-7.kql b/docs/content/services/networking/expressroute-gateway/code/ergw-7/ergw-7.kql
new file mode 100644
index 000000000..4984e37dc
--- /dev/null
+++ b/docs/content/services/networking/expressroute-gateway/code/ergw-7/ergw-7.kql
@@ -0,0 +1,20 @@
+// Azure Resource Graph Query
+// Find all Virtual Network Gateways without Maintenance Configurations
+
+resources
+| where type =~ "Microsoft.Network/virtualNetworkGateways"
+| extend resourceId = tolower(id)
+| join kind=leftouter (
+ maintenanceresources
+ | where type =~ "Microsoft.Maintenance/configurationAssignments"
+ | project JsonData = parse_json(properties)
+ | extend maintenanceConfigurationId = tolower(tostring(JsonData.maintenanceConfigurationId))
+ | join kind=inner (
+ resources
+ | where type =~ "Microsoft.Maintenance/maintenanceConfigurations"
+ | project maintenanceConfigurationId=tolower(id)
+ ) on maintenanceConfigurationId
+ | project maintenanceConfigurationId, resourceId=tolower(tostring(JsonData.resourceId))
+) on resourceId
+| where isempty(maintenanceConfigurationId)
+| project recommendationId = "ergw-7", name, id, tags, param1= strcat("sku-tier: " , properties.sku.tier), param2=location
diff --git a/docs/content/services/networking/expressroute-traffic-collector/_index.md b/docs/content/services/networking/expressroute-traffic-collector/_index.md
new file mode 100644
index 000000000..492ddd734
--- /dev/null
+++ b/docs/content/services/networking/expressroute-traffic-collector/_index.md
@@ -0,0 +1,52 @@
++++
+title = "ExpressRoute Traffic Collector"
+description = "Best practices and resiliency recommendations for ExpressRoute Traffic Collector and associated resources and settings."
+date = "1/28/24"
+author = "ehaslett"
+msAuthor = "ethaslet"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include ExpressRoute Traffic Collector and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
+| [ERTC-1 - Ensure ExpressRoute Traffic Collector is enabled and configured for ExpressRoute Direct circuits](#ertc-1---ensure-expressroute-traffic-collector-is-enabled-and-configured-for-expressroute-direct-circuits) | Monitoring | Medium | Verified | No |
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### ERTC-1 - Ensure ExpressRoute Traffic Collector is enabled and configured for ExpressRoute Direct circuits
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+ExpressRoute Traffic Collector enables sampling of network flows sent over your ExpressRoute Direct circuits. Flow logs get sent to a Log Analytics workspace where you can create your own log queries for further analysis. You can also export the data to any visualization tool or SIEM (Security Information and Event Management) of your choice. Flow logs can be enabled for both private peering and Microsoft peering with ExpressRoute Traffic Collector.
+
+You can associate a single ExpressRoute Direct circuit with multiple ExpressRoute Traffic Collectors deployed in different Azure region within a given geo-political region. It's recommended that you associate your ExpressRoute Direct circuit with multiple ExpressRoute Traffic Collectors as part of your disaster recovery and high availability plan.
+
+**Resources**
+
+- [Azure ExpressRoute Traffic Collector](https://learn.microsoft.com/en-us/azure/expressroute/traffic-collector)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/ertc-1/ertc-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/networking/expressroute-traffic-collector/code/ertc-1/ertc-1.kql b/docs/content/services/networking/expressroute-traffic-collector/code/ertc-1/ertc-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/expressroute-traffic-collector/code/ertc-1/ertc-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/firewall/_index.md b/docs/content/services/networking/firewall/_index.md
index 02dae74a4..a16019601 100644
--- a/docs/content/services/networking/firewall/_index.md
+++ b/docs/content/services/networking/firewall/_index.md
@@ -12,15 +12,14 @@ The presented resiliency recommendations in this guidance include Firewall and a
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :---------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :------: | :-----------------: |
-| [AFW-1 - Deploy Azure Firewall across multiple availability zones](#afw-1---deploy-azure-firewall-across-multiple-availability-zones) | High | Preview | No |
-| [AFW-2 - Test Azure Firewall performance](#afw-2---test-azure-firewall-performance) | High | Preview | No |
-| [AFW-3 - Monitor Azure Firewall metrics](#afw-3---monitor-azure-firewall-metrics) | High | Preview | No |
-| [AFW-4 - Deploy an instance of Azure Firewall per region](#afw-4---deploy-an-instance-of-azure-firewall-per-region) | High | Preview | No |
-| [AFW-5 - Configure DDoS Protection on the Azure Firewall VNet](#afw-5---configure-ddos-protection-on-the-azure-firewall-vnet) | High | Preview | No |
-| [AFW-6 - Leverage Azure Policy inheritance model](#afw-6---leverage-azure-policy-inheritance-model) | Medium | Preview | No |
-| [AFW-7 - Understand impact of management operations on long running TCP sessions](#afw-7---understand-impact-of-management-operations-on-long-running-tcp-sessions) | Medium | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [AFW-1 - Deploy Azure Firewall across multiple availability zones](#afw-1---deploy-azure-firewall-across-multiple-availability-zones) | Availability | High | Verified | Yes |
+| [AFW-2 - Monitor Azure Firewall metrics](#afw-2---monitor-azure-firewall-metrics) | Monitoring | Medium | Verified | Yes |
+| [AFW-3 - Configure DDoS Protection on the Azure Firewall VNet](#afw-3---configure-ddos-protection-on-the-azure-firewall-vnet) | Access & Security | High | Verified | Yes |
+| [AFW-4 - Leverage Azure Policy inheritance model](#afw-4---leverage-azure-policy-inheritance-model) | Governance | Medium | Verified | No |
+| [AFW-5 - Configure 2-4 PIPs for SNAT Port utilization](#afw-5---configure-2-4-pips-for-snat-port-utilization) | Availability | Medium | Preview | No |
+| [AFW-6 - Monitor AZFW Latency Probes metric](#afw-6---monitor-azfw-latency-probes-metric) | Monitoring | Medium | Preview | No |
{{< /table >}}
@@ -47,7 +46,7 @@ Azure Firewall provides different SLAs when it's deployed in a single availabili
- [Azure Well Architected Framework - Azure Firewall](https://learn.microsoft.com/azure/architecture/framework/services/networking/azure-firewall)
- [Deploy Azure Firewall across multiple availability zones](https://learn.microsoft.com/azure/firewall/deploy-availability-zone-powershell)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -57,157 +56,126 @@ Azure Firewall provides different SLAs when it's deployed in a single availabili
-### AFW-2 - Test Azure Firewall performance
-
-**Category: System Efficiency**
-
-**Impact: High**
-
-**Guidance**
-
-Reliable firewall performance is essential to operate and protect your virtual networks in Azure. More advanced features (like those found in Azure Firewall Premium) require more processing capacity. This will affect firewall performance and impact the overall network performance. Before you deploy Azure Firewall, the performance needs to be tested and evaluated to ensure it meets your expectations. Not only should Azure Firewall handle the current traffic on a network, but it should also be ready for potential traffic growth. It's recommended to evaluate on a test network and not in a production environment. The testing should attempt to replicate the production environment as close as possible. This includes the network topology, and emulating the actual characteristics of the expected traffic through the firewall.
-
-**Resources**
-
-- [Azure Firewall performance](https://learn.microsoft.com/azure/firewall/firewall-performance)
-- [Azure Firewall performance data](https://learn.microsoft.com/azure/firewall/firewall-performance#performance-data)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/afw-2/afw-2.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### AFW-3 - Monitor Azure Firewall metrics
+### AFW-2 - Monitor Azure Firewall metrics
**Category: Monitoring**
-**Impact: High**
+**Impact: Medium**
**Guidance**
Monitor metrics related to availability and performance issues. More specifically:
-- *FirewallHealth*: Indicates the overall health of the firewall.
-- *Throughput*: Throughput processed by the firewall. An alert should be triggered if throughput gets close to the documented limits.
-- *SNATPortUtilization*: Percentage of outbound SNAT ports currently in use. An alert should be triggered if this metric gets close to 100% (at which point Source-NATted connections, such as outbound internet connections will start to fail). If you'll need more than 512,000 SNAT ports, deploying a NAT gateway with Azure Firewall can be considered. However, deploying NAT gateway with a zone redundant firewall is not recommended deployment option, as the NAT gateway does not support zonal deployment at this time. In order to use NAT gateway with Azure Firewall, a zonal Firewall deployment is required. In addition, Azure Virtual Network NAT integration is not currently supported in secured virtual hub network architectures.
+- _FirewallHealth_: Indicates the overall health of the firewall.
+- _Throughput_: Throughput processed by the firewall. An alert should be triggered if throughput gets close to the documented limits.
+- _SNATPortUtilization_: Percentage of outbound SNAT ports currently in use. An alert should be triggered if this metric gets close to 100% (at which point Source-NATted connections, such as outbound internet connections will start to fail). If you'll need more than 512,000 SNAT ports, deploying a NAT gateway with Azure Firewall can be considered. However, deploying NAT gateway with a zone redundant firewall is not recommended deployment option, as the NAT gateway does not support zonal deployment at this time. In order to use NAT gateway with Azure Firewall, a zonal Firewall deployment is required. In addition, Azure Virtual Network NAT integration is not currently supported in secured virtual hub network architectures.
**Resources**
- [Azure Firewall metrics supported in Azure Monitor](https://learn.microsoft.com/azure/azure-monitor/essentials/metrics-supported#microsoftnetworkazurefirewalls)
- [Azure Firewall performance](https://learn.microsoft.com/azure/firewall/firewall-performance)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afw-3/afw-3.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afw-2/afw-2.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFW-4 - Deploy an instance of Azure Firewall per region
+### AFW-3 - Configure DDoS Protection on the Azure Firewall VNet
-**Category: Availability**
+**Category: Access & Security**
**Impact: High**
**Guidance**
-In multi-region environments, deploy an instance of Azure Firewall per region. For workloads designed to be resistant to failures and fault tolerant, remember to consider that instances of Azure Firewall and Azure Virtual Network are regional resources.
+Associate a DDoS protection plan with the virtual network hosting Azure Firewall. A DDoS protection plan provides enhanced mitigation features to defend your firewall from DDoS attacks. Azure Firewall Manager is an integrated tool to create your firewall infrastructure and DDoS protection plans.
**Resources**
-- [Azure Well Architected Framework - Azure Firewall](https://learn.microsoft.com/azure/architecture/framework/services/networking/azure-firewall)
+- [Azure DDoS Protection overview](https://learn.microsoft.com/azure/ddos-protection/ddos-protection-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afw-4/afw-4.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afw-3/afw-3.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFW-5 - Configure DDoS Protection on the Azure Firewall VNet
+### AFW-4 - Leverage Azure Policy inheritance model
-**Category: Access & Security**
+**Category: Governance**
-**Impact: High**
+**Impact: Medium**
**Guidance**
-Associate a DDoS protection plan with the virtual network hosting Azure Firewall. A DDoS protection plan provides enhanced mitigation features to defend your firewall from DDoS attacks. Azure Firewall Manager is an integrated tool to create your firewall infrastructure and DDoS protection plans.
+Azure Firewall policy allows you to define a rule hierarchy and enforce compliance. It provides a hierarchical structure to overlay a central base policy on top of a child application team policy. The base policy has a higher priority and runs before the child policy. Use an Azure custom role definition to prevent inadvertent base policy removal and provide selective access to rule collection groups within a subscription or resource group.
**Resources**
-- [Azure DDoS Protection overview](https://learn.microsoft.com/azure/ddos-protection/ddos-protection-overview)
+- [Azure Firewall Policy hierarchy](https://learn.microsoft.com/azure/firewall-manager/rule-hierarchy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afw-5/afw-5.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afw-4/afw-4.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFW-6 - Leverage Azure Policy inheritance model
+### AFW-5 - Configure 2-4 PIPs for SNAT Port utilization
-**Category: Governance**
+**Category: Availability**
**Impact: Medium**
**Guidance**
-Azure Firewall policy allows you to define a rule hierarchy and enforce compliance. It provides a hierarchical structure to overlay a central base policy on top of a child application team policy. The base policy has a higher priority and runs before the child policy. Use an Azure custom role definition to prevent inadvertent base policy removal and provide selective access to rule collection groups within a subscription or resource group.
+Configure a minimum of two to four public IP addresses per Azure Firewall to avoid SNAT exhaustion. Azure Firewall provides SNAT capability for all outbound traffic traffic to public IP addresses. Azure Firewall provides 2,496 SNAT ports per each additional PIP.
**Resources**
-- [Azure Firewall Policy hierarchy](https://learn.microsoft.com/azure/firewall-manager/rule-hierarchy)
+- [Azure Well-Architected Framework review - Azure Firewall](https://learn.microsoft.com/en-us/azure/well-architected/service-guides/azure-firewall#recommendations)
-**Resource Graph Query/Scripts**
+**Resource Graphy Query/Scripts**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afw-6/afw-6.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afw-5/afw-5.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFW-7 - Understand impact of management operations on long running TCP sessions
+### AFW-6 - Monitor AZFW Latency Probes metric
-**Category: System Efficiency**
+**Category: Monitoring**
**Impact: Medium**
**Guidance**
-Azure Firewall is designed to be available and redundant. Every effort is made to avoid service disruptions. However, there are few scenarios where Azure Firewall can potentially drop long running TCP sessions. The following scenarios can potentially drop long running TCP sessions:
-
-- Scale in
-- Firewall maintenance
-- Idle timeout
-- Auto-recovery
+Create the metric to monitor latency probes 20ms over a long period of time ( > 30mins ). When the latency probe is over a long period of time, it means the firewall instance CPUs are stressed and could possible be causing issues.
**Resources**
-- [Long running TCP sessions](https://learn.microsoft.com/azure/firewall/long-running-sessions)
+- [Azure Well-Architected Framework review - Azure Firewall](https://learn.microsoft.com/azure/well-architected/service-guides/azure-firewall#recommendations)
+- [Azure Firewall metrics overview](https://learn.microsoft.com/azure/firewall/metrics)
-**Resource Graph Query/Scripts**
+**Resource Graphy Query/Scripts**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afw-7/afw-7.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afw-6/afw-6.kql" >}} {{< /code >}}
{{< /collapse >}}
-
-
diff --git a/docs/content/services/networking/firewall/code/afw-1/afw-1.kql b/docs/content/services/networking/firewall/code/afw-1/afw-1.kql
index 614a7f9ca..0a76e0cf7 100644
--- a/docs/content/services/networking/firewall/code/afw-1/afw-1.kql
+++ b/docs/content/services/networking/firewall/code/afw-1/afw-1.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// List all Azure Firewalls that are not configured with multiple availability zones or deployed without a zone
+resources
+| where type == 'microsoft.network/azurefirewalls'
+| where array_length(zones) <= 1 or isnull(zones)
+| project recommendationId = "afw-1", name, id, tags, param1="multipleZones:false"
diff --git a/docs/content/services/networking/firewall/code/afw-1/afw-1.kql.fix b/docs/content/services/networking/firewall/code/afw-1/afw-1.kql.fix
deleted file mode 100644
index bb6998b6e..000000000
--- a/docs/content/services/networking/firewall/code/afw-1/afw-1.kql.fix
+++ /dev/null
@@ -1,4 +0,0 @@
-// Find Azure Firewalls that have been deployed as non-zonal/noo-zone-redundant resources
-resources
-| where type == 'microsoft.network/azurefirewalls' and zones != ""
-| project recommendationid="afw-1",name, id
diff --git a/docs/content/services/networking/firewall/code/afw-2/afw-2.kql b/docs/content/services/networking/firewall/code/afw-2/afw-2.kql
index 614a7f9ca..b92dd748c 100644
--- a/docs/content/services/networking/firewall/code/afw-2/afw-2.kql
+++ b/docs/content/services/networking/firewall/code/afw-2/afw-2.kql
@@ -1 +1,21 @@
-// under-development
+// Azure Resource Graph Query
+// List all Azure Firewalls resources in-scope, along with any metrics associated to Azure Monitor alert rules, that are not fully configured.
+resources
+| where type == "microsoft.network/azurefirewalls"
+| project firewallId = tolower(id), name, tags
+| join kind = leftouter (
+ resources
+ | where type == "microsoft.insights/metricalerts"
+ | mv-expand properties.scopes
+ | mv-expand properties.criteria.allOf
+ | where properties_scopes contains "azureFirewalls"
+ | project metricId = tolower(properties_scopes), monitoredMetric = properties_criteria_allOf.metricName, tags
+ | summarize monitoredMetrics = make_list(monitoredMetric) by tostring(metricId)
+ | project
+ metricId,
+ monitoredMetrics,
+ allAlertsConfigured = monitoredMetrics contains("FirewallHealth") and monitoredMetrics contains ("Throughput") and monitoredMetrics contains ("SNATPortUtilization")
+) on $left.firewallId == $right.metricId
+| extend alertsNotFullyConfigured = isnull(allAlertsConfigured) or not(allAlertsConfigured)
+| where alertsNotFullyConfigured
+| project recommendationId = "afw-3", name, id = firewallId, tags, param1 = strcat("MetricsAlerts:", monitoredMetrics)
diff --git a/docs/content/services/networking/firewall/code/afw-3/afw-3.kql b/docs/content/services/networking/firewall/code/afw-3/afw-3.kql
index 614a7f9ca..29b54ce71 100644
--- a/docs/content/services/networking/firewall/code/afw-3/afw-3.kql
+++ b/docs/content/services/networking/firewall/code/afw-3/afw-3.kql
@@ -1 +1,16 @@
-// under-development
+// Azure Resource Graph Query
+// List all in-scope Azure Firewall resources, where the VNet is not associated to a DDoS Protection Plan
+resources
+| where type == "microsoft.network/azurefirewalls"
+| mv-expand properties.ipConfigurations
+| project name, firewallId = id, vNet = substring(properties_ipConfigurations.properties.subnet.id, 0, indexof(properties_ipConfigurations.properties.subnet, "/subnet") - 7), tags
+| join kind=fullouter (
+ resources
+ | where type == "microsoft.network/ddosprotectionplans"
+ | mv-expand properties.virtualNetworks
+ | extend vNet = tostring(properties_virtualNetworks.id)
+ | project ddosProtectionPlan = id, vNet
+ )
+ on $left.vNet == $right.vNet
+| where ddosProtectionPlan == ''
+| project recommendationId = "afw-5", name, id = firewallId, tags, param1 = "ddosProtectionPlan:false"
diff --git a/docs/content/services/networking/firewall/code/afw-3/afw-3.kql.fix b/docs/content/services/networking/firewall/code/afw-3/afw-3.kql.fix
deleted file mode 100644
index 9947b7bbd..000000000
--- a/docs/content/services/networking/firewall/code/afw-3/afw-3.kql.fix
+++ /dev/null
@@ -1,9 +0,0 @@
-// List all Azure Firewalls resources in-scope, along with any metrics associated to Azure Monitor alert rules
-resources
-| where type == "microsoft.insights/metricalerts"
-| mv-expand properties.scopes
-| mv-expand properties.criteria.allOf
-| project firewallId = properties_scopes, monitoredMetric = properties_criteria_allOf.metricName
-| summarize monitoredMetrics=make_list(monitoredMetric) by tostring(firewallId)
-| join kind=fullouter (resources | where type == "microsoft.network/azurefirewalls" | project rightFirewallId = id) on $left.firewallId == $right.rightFirewallId
-| project recommendationid="afw-3",name, id, param1= rightFirewallId, param2= monitoredMetrics
diff --git a/docs/content/services/networking/firewall/code/afw-5/afw-5.kql b/docs/content/services/networking/firewall/code/afw-5/afw-5.kql
index 614a7f9ca..7b5bb5473 100644
--- a/docs/content/services/networking/firewall/code/afw-5/afw-5.kql
+++ b/docs/content/services/networking/firewall/code/afw-5/afw-5.kql
@@ -1 +1 @@
-// under-development
+// under development
diff --git a/docs/content/services/networking/firewall/code/afw-5/afw-5.kql.fix b/docs/content/services/networking/firewall/code/afw-5/afw-5.kql.fix
deleted file mode 100644
index eb9db4705..000000000
--- a/docs/content/services/networking/firewall/code/afw-5/afw-5.kql.fix
+++ /dev/null
@@ -1,11 +0,0 @@
-// List all in-scope Azure Firewall resources, along with their associated DDoS Protection Plan (if any)
-resources
-| where type == "microsoft.network/azurefirewalls"
-| mv-expand properties.ipConfigurations
-| project name, firewallId = id, vNet= substring(properties_ipConfigurations.properties.subnet.id, 0, indexof(properties_ipConfigurations.properties.subnet,"/subnet") - 7)
-| join kind=fullouter (resources
- | where type == "microsoft.network/ddosprotectionplans"
- | mv-expand properties.virtualNetworks
- | extend vNet = tostring(properties_virtualNetworks.id)
- | project ddosProtectionPlan = id, vNet) on $left.vNet == $right.vNet
-| project recommendationId="afw-5", name, id=firewallId, param1=strcat("ddosProtectionPlan: ", ddosProtectionPlan)
diff --git a/docs/content/services/networking/firewall/code/afw-6/afw-6.kql b/docs/content/services/networking/firewall/code/afw-6/afw-6.kql
index 614a7f9ca..7b5bb5473 100644
--- a/docs/content/services/networking/firewall/code/afw-6/afw-6.kql
+++ b/docs/content/services/networking/firewall/code/afw-6/afw-6.kql
@@ -1 +1 @@
-// under-development
+// under development
diff --git a/docs/content/services/networking/front-door/_index.md b/docs/content/services/networking/front-door/_index.md
index 7f793221c..896347078 100644
--- a/docs/content/services/networking/front-door/_index.md
+++ b/docs/content/services/networking/front-door/_index.md
@@ -14,25 +14,23 @@ The presented resiliency recommendations in this guidance include Front Door and
The below table shows the list of resiliency recommendations for Front Door and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: |:------: | :-----------------: |
-| [AFD-1 - Avoid combining Traffic Manager and Front Door](#afd-1---avoid-combining-traffic-manager-and-front-door) | High |Preview | No |
-| [AFD-2 - Restrict traffic to your origins](#afd-2---restrict-traffic-to-your-origins) | High | Preview | No |
-| [AFD-3 - Use the latest API version and SDK version](#afd-3---use-the-latest-api-version-and-sdk-version) | High | Preview | No |
-| [AFD-4 - Configure logs](#afd-4---configure-logs) | Medium | Preview | No |
-| [AFD-5 - Use end-to-end TLS](#afd-5---use-end-to-end-tls) | High | Preview | No |
-| [AFD-6 - Use HTTP to HTTPS redirection](#afd-6---use-http-to-https-redirection) | High | Preview | No |
-| [AFD-7 - Use managed TLS certificates](#afd-7---use-managed-tls-certificates) | Medium | Preview | No |
-| [AFD-8 - Use latest version for customer-managed certificates](#afd-8---use-latest-version-for-customer-managed-certificates) | Medium | Preview | No |
-| [AFD-9 - Use the same domain name on Front Door and your origin](#afd-9---use-the-same-domain-name-on-front-door-and-your-origin) | Medium | Preview | No |
-| [AFD-10 - Enable the WAF](#afd-10---enable-the-waf) | Medium | Preview | No |
-| [AFD-11 - Follow WAF best practices](#afd-11---follow-waf-best-practices) | High | Preview | No |
-| [AFD-12 - Disable health probes when there is only one origin in an origin group](#afd-12---disable-health-probes-when-there-is-only-one-origin-in-an-origin-group) | Low | Preview | No |
-| [AFD-13 - Select good health probe endpoints](#afd-13---select-good-health-probe-endpoints) | Medium | Preview | No |
-| [AFD-14 - Use HEAD health probes](#afd-14---use-head-health-probes) | Medium | Preview | No |
-| [AFD-15 - Lock down Application Gateway to receive traffic only from Azure Front Door](#afd-15---lock-down-application-gateway-to-receive-traffic-only-from-azure-front-door) | Medium | Preview | No |
-| [AFD-16 - Use geo-filtering in Azure Front Door](#afd-16---use-geo-filtering-in-azure-front-door) | Medium | Preview | No |
-| [AFD-17 - Secure your Origin with Private Link in Azure Front Door](#afd-17---secure-your-origin-with-private-link-in-azure-front-door) | Medium | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [AFD-1 - Avoid combining Traffic Manager and Front Door](#afd-1---avoid-combining-traffic-manager-and-front-door) | Networking | High | Verified | Yes |
+| [AFD-2 - Restrict traffic to your origins](#afd-2---restrict-traffic-to-your-origins) | Access & Security | High | Verified | No |
+| [AFD-3 - Use the latest API version and SDK version](#afd-3---use-the-latest-api-version-and-sdk-version) | Networking | Medium | Verified | No |
+| [AFD-4 - Configure logs](#afd-4---configure-logs) | Monitoring | Medium | Verified | No |
+| [AFD-5 - Use end-to-end TLS](#afd-5---use-end-to-end-tls) | Security | High | Verified | No |
+| [AFD-6 - Use HTTP to HTTPS redirection](#afd-6---use-http-to-https-redirection) | Access & Security | High | Verified | No |
+| [AFD-7 - Use managed TLS certificates](#afd-7---use-managed-tls-certificates) | Access & Security | High | Verified | No |
+| [AFD-8 - Use latest version for customer-managed certificates](#afd-8---use-latest-version-for-customer-managed-certificates) | Access & Security | Medium | Verified | No |
+| [AFD-9 - Use the same domain name on Front Door and your origin](#afd-9---use-the-same-domain-name-on-front-door-and-your-origin) | Networking | Medium | Verified | No |
+| [AFD-10 - Enable the WAF](#afd-10---enable-the-waf) | Access & Security | Medium | Verified | No |
+| [AFD-11 - Disable health probes when there is only one origin in an origin group](#afd-11---disable-health-probes-when-there-is-only-one-origin-in-an-origin-group) | Availability | Low | Verified | Yes |
+| [AFD-12 - Select good health probe endpoints](#afd-12---select-good-health-probe-endpoints) | Availability | Medium | Verified | Yes |
+| [AFD-13 - Use HEAD health probes](#afd-13---use-head-health-probes) | System Efficiency | Medium | Verified | No |
+| [AFD-14 - Use geo-filtering in Azure Front Door](#afd-14---use-geo-filtering-in-azure-front-door) | Access & Security | Medium | Verified | No |
+| [AFD-15 - Secure your Origin with Private Link in Azure Front Door](#afd-15---secure-your-origin-with-private-link-in-azure-front-door) | Access & Security | Medium | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -45,23 +43,28 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### AFD-1 - Avoid combining Traffic Manager and Front Door
+**Category: Networking**
+
**Impact: High**
**Guidance**
-For most solutions, you should use either Front Door or Azure Traffic Manager, but not both. Traffic Manager is a DNS-based load balancer. It sends traffic directly to your origin's endpoints. In contrast, Front Door terminates connections at points of presence (PoPs) near to the client and establishes separate long-lived connections to the origins. The products work differently and are intended for different use cases.
+For most solutions, we recommend to use *either* Front Door *or* Traffic Manager, but not both. Azure Traffic Manager is a DNS-based load balancer. It sends traffic directly to your origin's endpoints. In contrast, Azure Front Door terminates connections at points of presence (PoPs) near to the client and establishes separate long-lived connections to the origins. The products work differently and are intended for different use cases.
If you need content caching and delivery (CDN), TLS termination, advanced routing capabilities, or a web application firewall (WAF), consider using Front Door. For simple global load balancing with direct connections from your client to your endpoints, consider using Traffic Manager.
-However, as part of a complex architecture, you might choose to use Traffic Manager in front of Front Door. In the unlikely event that Front Door is unavailable, Traffic Manager can route traffic to an alternative destination, such as Azure Application Gateway or a partner content delivery network (CDN). These architectures are difficult to implement and most customers don't need them.
+However, as part of a complex architecture that requires high availability, you can put an Azure Traffic Manager in front of an Azure Front Door. In the unlikely event that Azure Front Door is unavailable, Azure Traffic Manager can then route traffic to an alternative destination, such as Azure Application Gateway or a partner content delivery network (CDN).
+
+Don't put Azure Traffic Manager behind Azure Front Door. Azure Traffic Managers should always be in front of Azure Front Door.
**Resources**
- [Azure Load Balancing Options](https://learn.microsoft.com/azure/architecture/guide/technology-choices/load-balancing-overview)
- [Azure Traffic Manager](https://learn.microsoft.com/azure/traffic-manager/traffic-manager-overview)
- [Azure Front Door](https://learn.microsoft.com/azure/frontdoor/front-door-overview)
+- [Mission-critical global content delivery](https://learn.microsoft.com/en-us/azure/architecture/guide/networking/global-web-applications/mission-critical-content-delivery)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -73,6 +76,8 @@ However, as part of a complex architecture, you might choose to use Traffic Mana
### AFD-2 - Restrict traffic to your origins
+**Category: Access & Security**
+
**Impact: High**
**Guidance**
@@ -83,7 +88,7 @@ Front Door's features work best when traffic only flows through Front Door. You
- [Secure traffic to Azure Front Door origins](https://learn.microsoft.com/azure/frontdoor/origin-security?tabs=app-service-functions&pivots=front-door-standard-premium)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -95,7 +100,9 @@ Front Door's features work best when traffic only flows through Front Door. You
### AFD-3 - Use the latest API version and SDK version
-**Impact: High**
+**Category: Networking**
+
+**Impact: Medium**
**Guidance**
@@ -107,7 +114,7 @@ When you work with Front Door by using APIs, ARM templates, Bicep, or Azure SDKs
- [Client library for Java](https://learn.microsoft.com/java/api/overview/azure/resourcemanager-frontdoor-readme?view=azure-java-preview)
- [SDK for Python](https://learn.microsoft.com/python/api/overview/azure/front-door?view=azure-python)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -119,6 +126,8 @@ When you work with Front Door by using APIs, ARM templates, Bicep, or Azure SDKs
### AFD-4 - Configure logs
+**Category: Monitoring**
+
**Impact: Medium**
**Guidance**
@@ -131,7 +140,7 @@ Front Door tracks extensive telemetry about every request. When you enable cachi
- [WAF logs](https://learn.microsoft.com/azure/web-application-firewall/afds/waf-front-door-monitor?pivots=front-door-standard-premium#waf-logs)
- [Configure Azure Front Door logs](https://learn.microsoft.com/azure/frontdoor/standard-premium/how-to-logs)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -143,6 +152,8 @@ Front Door tracks extensive telemetry about every request. When you enable cachi
### AFD-5 - Use end-to-end TLS
+**Category: Security**
+
**Impact: High**
**Guidance**
@@ -153,7 +164,7 @@ Front Door terminates TCP and TLS connections from clients. It then establishes
- [End-to-end TLS with Azure Front Door](https://learn.microsoft.com/azure/frontdoor/end-to-end-tls?pivots=front-door-standard-premium)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -165,6 +176,8 @@ Front Door terminates TCP and TLS connections from clients. It then establishes
### AFD-6 - Use HTTP to HTTPS redirection
+**Category: Access & Security**
+
**Impact: High**
**Guidance**
@@ -177,7 +190,7 @@ You can configure Front Door to automatically redirect HTTP requests to use the
- [Create HTTP to HTTPS redirect rule](https://learn.microsoft.com/azure/frontdoor/front-door-how-to-redirect-https#create-http-to-https-redirect-rule)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -189,7 +202,9 @@ You can configure Front Door to automatically redirect HTTP requests to use the
### AFD-7 - Use managed TLS certificates
-**Impact: Medium**
+**Category: Access & Security**
+
+**Impact: High**
**Guidance**
@@ -199,7 +214,7 @@ When Front Door manages your TLS certificates, it reduces your operational costs
- [Configure HTTPS on an Azure Front Door custom domain using the Azure portal](https://learn.microsoft.com/azure/frontdoor/standard-premium/how-to-configure-https-custom-domain?tabs=powershell)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -211,6 +226,8 @@ When Front Door manages your TLS certificates, it reduces your operational costs
### AFD-8 - Use latest version for customer-managed certificates
+**Category: Access & Security**
+
**Impact: Medium**
**Guidance**
@@ -221,7 +238,7 @@ If you decide to use your own TLS certificates, then consider setting the Key Va
- [Select the certificate for Azure Front Door to deploy](https://learn.microsoft.com/azure/frontdoor/standard-premium/how-to-configure-https-custom-domain?tabs=powershell#select-the-certificate-for-azure-front-door-to-deploy)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -233,6 +250,8 @@ If you decide to use your own TLS certificates, then consider setting the Key Va
### AFD-9 - Use the same domain name on Front Door and your origin
+**Category: Networking**
+
**Impact: Medium**
**Guidance**
@@ -245,7 +264,7 @@ Before you rewrite the Host header of your requests, carefully consider whether
- [Preserve the original HTTP host name between a reverse proxy and its back-end web application](https://learn.microsoft.com/azure/architecture/best-practices/host-name-preservation)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -257,6 +276,8 @@ Before you rewrite the Host header of your requests, carefully consider whether
### AFD-10 - Enable the WAF
+**Category: Access & Security**
+
**Impact: Medium**
**Guidance**
@@ -267,7 +288,7 @@ For internet-facing applications, we recommend you enable the Front Door web app
- [https://learn.microsoft.com/azure/frontdoor/web-application-firewall](https://learn.microsoft.com/azure/frontdoor/web-application-firewall)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -277,29 +298,9 @@ For internet-facing applications, we recommend you enable the Front Door web app
-### AFD-11 - Follow WAF best practices
-
-**Impact: High**
-
-**Guidance**
-
-The WAF for Front Door has its own set of best practices for its configuration and use.
-
-**Resources**
-
-- [Best practices for Web Application Firewall (WAF) on Azure Front Door](https://learn.microsoft.com/azure/web-application-firewall/afds/waf-front-door-best-practices)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/afd-11/afd-11.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
+### AFD-11 - Disable health probes when there is only one origin in an origin group
-### AFD-12 - Disable health probes when there is only one origin in an origin group
+**Category: Availability**
**Impact: Low**
@@ -313,17 +314,19 @@ If you only have a single origin, Front Door always routes traffic to that origi
- [Health probes](https://learn.microsoft.com/azure/frontdoor/health-probes)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afd-12/afd-12.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afd-11/afd-11.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFD-13 - Select good health probe endpoints
+### AFD-12 - Select good health probe endpoints
+
+**Category: Availability**
**Impact: Medium**
@@ -335,17 +338,19 @@ Consider the location where you tell Front Door's health probe to monitor. It's
- [Health Endpoint Monitoring pattern](https://learn.microsoft.com/azure/architecture/patterns/health-endpoint-monitoring)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afd-13/afd-13.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afd-12/afd-12.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFD-14 - Use HEAD health probes
+### AFD-13 - Use HEAD health probes
+
+**Category: System Efficiency**
**Impact: Medium**
@@ -357,40 +362,19 @@ Health probes can use either the GET or HEAD HTTP method. It's a good practice t
- [Supported HTTP methods for health probes](https://learn.microsoft.com/azure/frontdoor/health-probes#supported-http-methods-for-health-probes)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afd-14/afd-14.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afd-13/afd-13.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFD-15 - Lock down Application Gateway to receive traffic only from Azure Front Door
-
-**Impact: Medium**
-
-**Guidance**
-
-Lock down Application Gateway to receive traffic only from Azure Front Door when using Azure Front Door and Application Gateway to protect HTTP/S applications.
-Certain scenarios can force a customer to implement rules specifically on AppGateway: For example, if ModSec Core Rule Set (CRS) 2.2.9, CRS 3.0, or CRS 3.1 rules are required, rules can be only implemented on AppGatway. Rate-limiting and geo-filtering are available only on Azure Front Door, not on AppGateway.
+### AFD-14 - Use geo-filtering in Azure Front Door
-**Resources**
-
-- [Application Gateway behind Front Door](https://learn.microsoft.com/azure/frontdoor/front-door-faq#how-do-i-lock-down-the-access-to-my-backend-to-only-azure-front-door)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/afd-15/afd-15.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### AFD-16 - Use geo-filtering in Azure Front Door
+**Category: Access & Security**
**Impact: Medium**
@@ -405,17 +389,19 @@ For a geo filtering rule, a match variable is either RemoteAddr or SocketAddr. R
- [Geo filter WAF policy - GeoMatch](https://learn.microsoft.com/azure/web-application-firewall/afds/waf-front-door-geo-filtering)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afd-16/afd-16.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afd-14/afd-14.kql" >}} {{< /code >}}
{{< /collapse >}}
-### AFD-17 - Secure your Origin with Private Link in Azure Front Door
+### AFD-15 - Secure your Origin with Private Link in Azure Front Door
+
+**Category: Access & Security**
**Impact: Medium**
@@ -429,11 +415,11 @@ Azure Front Door Premium can connect to your origin using Private Link. Your ori
- [Private link for Azure Front Door](https://learn.microsoft.com/azure/frontdoor/private-link)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/afd-17/afd-17.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/afd-15/afd-15.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/networking/front-door/code/afd-1/afd-1.kql b/docs/content/services/networking/front-door/code/afd-1/afd-1.kql
index fa5cad258..a2bcfcce5 100644
--- a/docs/content/services/networking/front-door/code/afd-1/afd-1.kql
+++ b/docs/content/services/networking/front-door/code/afd-1/afd-1.kql
@@ -1 +1 @@
-// cannot-be-validated-with-arg
+// cannot-be-validated-with-arg
\ No newline at end of file
diff --git a/docs/content/services/networking/front-door/code/afd-10/afd-10.kql b/docs/content/services/networking/front-door/code/afd-10/afd-10.kql
index 28338515a..c4de704d3 100644
--- a/docs/content/services/networking/front-door/code/afd-10/afd-10.kql
+++ b/docs/content/services/networking/front-door/code/afd-10/afd-10.kql
@@ -1,6 +1,6 @@
// Azure Resource Graph Query
-// Goal: Show any WAF (Web Application Firewall) policies enabled on the AFD.
+// Goal: Show any WAF (Web Application Firewall) policies Disabled on the AFD.
resources
| where type == "microsoft.cdn/frontdoorwebapplicationfirewallpolicies"
-| where properties['policySettings']['enabledState'] == "Enabled"
-| project recommendationId = "afd-10", name, id, tags
+| where properties['policySettings']['enabledState'] != "Enabled"
+| project recommendationId = "afd-10", name, id, tags
\ No newline at end of file
diff --git a/docs/content/services/networking/front-door/code/afd-11/afd-11.kql b/docs/content/services/networking/front-door/code/afd-11/afd-11.kql
index 614a7f9ca..a34f88973 100644
--- a/docs/content/services/networking/front-door/code/afd-11/afd-11.kql
+++ b/docs/content/services/networking/front-door/code/afd-11/afd-11.kql
@@ -1 +1,21 @@
-// under-development
+// Azure Resource Graph Query
+// AFD-11 - Disable health probes when there is only one origin in an origin group
+cdnresources
+| where type =~ "microsoft.cdn/profiles/origingroups"
+| extend healthprobe=tostring(properties.healthProbeSettings)
+| project origingroupname=name, id, tags, resourceGroup, subscriptionId, healthprobe
+| join (
+ cdnresources
+ | where type =~ "microsoft.cdn/profiles/origingroups/Origins"
+ | extend origingroupname = tostring(properties.originGroupName)
+ )
+ on origingroupname
+| summarize origincount=count(), enabledhealthprobecount=countif(healthprobe != "") by origingroupname, id, tostring(tags), resourceGroup, subscriptionId
+| where origincount == 1 and enabledhealthprobecount != 0
+| project
+ recommendationId = "afd-11",
+ name=origingroupname,
+ id,
+ todynamic(tags),
+ param1 = strcat("origincount:", origincount),
+ param2 = strcat("enabledhealthprobecount:", enabledhealthprobecount)
diff --git a/docs/content/services/networking/front-door/code/afd-11/afd-11.kql.fix b/docs/content/services/networking/front-door/code/afd-11/afd-11.kql.fix
deleted file mode 100644
index 51060d474..000000000
--- a/docs/content/services/networking/front-door/code/afd-11/afd-11.kql.fix
+++ /dev/null
@@ -1,4 +0,0 @@
-resources
-| where type == "microsoft.network/frontdoorwebapplicationfirewallpolicies"
-| where properties['managedRules']['managedRuleSets'][0]['ruleSetType'] == "Microsoft_DefaultRuleSet"
-| project recommendationId = "afd-11", name, id
diff --git a/docs/content/services/networking/front-door/code/afd-5/afd-5.kql b/docs/content/services/networking/front-door/code/afd-5/afd-5.kql
index 614a7f9ca..7e3813b75 100644
--- a/docs/content/services/networking/front-door/code/afd-5/afd-5.kql
+++ b/docs/content/services/networking/front-door/code/afd-5/afd-5.kql
@@ -1 +1,8 @@
-// under-development
+// Azure Resource Graph Query
+// Use end-to-end TLS
+cdnresources
+| where type == "microsoft.cdn/profiles/afdendpoints/routes"
+| extend forwardingProtocol=tostring(properties.forwardingProtocol),supportedProtocols=properties.supportedProtocols
+| project id,name,forwardingProtocol,supportedProtocols,tags
+| where forwardingProtocol !~ "httpsonly" or supportedProtocols has "http"
+| project recommendationId= "afd-5", name,id,tags,param1=strcat("forwardingProtocol:",forwardingProtocol),param2=strcat("supportedProtocols:",supportedProtocols)
diff --git a/docs/content/services/networking/front-door/code/afd-6/afd-6.kql b/docs/content/services/networking/front-door/code/afd-6/afd-6.kql
index 614a7f9ca..b3eace390 100644
--- a/docs/content/services/networking/front-door/code/afd-6/afd-6.kql
+++ b/docs/content/services/networking/front-door/code/afd-6/afd-6.kql
@@ -1 +1,8 @@
-// under-development
+// Azure Resource Graph Query
+// Use HTTP to HTTPS redirection
+cdnresources
+| where type == "microsoft.cdn/profiles/afdendpoints/routes"
+| extend httpsRedirect=tostring(properties.httpsRedirect)
+| project id,name,httpsRedirect,tags
+| where httpsRedirect !~ "enabled"
+| project recommendationId= "afd-6", name,id,tags,param1=strcat("httpsRedirect:",httpsRedirect)
diff --git a/docs/content/services/networking/general-networking/_index.md b/docs/content/services/networking/general-networking/_index.md
deleted file mode 100644
index 7413fdb55..000000000
--- a/docs/content/services/networking/general-networking/_index.md
+++ /dev/null
@@ -1,152 +0,0 @@
-+++
-title = "General Networking"
-description = "Best practices and resiliency recommendations for General Networking and associated resources and settings."
-date = "6/29/23"
-author = "maheshbenke"
-msAuthor = "maheshbenke"
-draft = false
-+++
-
-The presented resiliency recommendations in this guidance include General Networking and associated resources and settings.
-
-## Summary of Recommendations
-
-{{< table style="table-striped" >}}
-| Recommendation | Category | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
-| [GNW-1 - Use ExpressRoute as the primary connectivity channel for connecting an on-premises network to Azure](#gnw-1---use-expressroute-as-the-primary-connectivity-channel-for-connecting-an-on-premises-network-to-azure) | Networking | High | Preview | No |
-| [GNW-2 - Simulate a failure path to ensure that connectivity is available over alternative paths](#gnw-2---simulate-a-failure-path-to-ensure-that-connectivity-is-available-over-alternative-paths) | Networking | High | Preview | No |
-| [GNW-3 - Use a global load balancer to distribute traffic and failover across regions](#gnw-3---use-a-global-load-balancer-to-distribute-traffic-and-failover-across-regions) | Networking | Medium | Preview | No |
-| [GNW-4 - Eliminate all single points of failure from the data path both on-premises and hosted on Azure](#gnw-4---eliminate-all-single-points-of-failure-from-the-data-path-both-on-premises-and-hosted-on-azure) | Networking | Medium | Preview | No |
-| [GNW-5 - Assess critical application dependencies with health probes](#gnw-5---assess-critical-application-dependencies-with-health-probes) | Networking | Medium | Preview | No |
-{{< /table >}}
-
-{{< alert style="info" >}}
-
-Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
-
-{{< /alert >}}
-
-## Recommendations Details
-
-### GNW-1 - Use ExpressRoute as the primary connectivity channel for connecting an on-premises network to Azure
-
-**Category: Networking**
-
-**Impact: High**
-
-**Guidance**
-
-Use ExpressRoute as the primary connectivity channel for connecting an on-premises network to Azure. You can use VPNs as a source of backup connectivity to enhance connectivity resiliency.
-For cross-premises connectivity, by using Azure ExpressRoute or VPN, ensure that there are redundant connections from different locations.
-At least two redundant connections should be established across two or more Azure regions and peering locations to ensure there are no single points of failure. An active/active load-shared configuration provides path diversity and promotes availability of network connection paths.
-
-**Resources**
-
-- [Connectivity to Azure - Cloud Adoption Framework](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/connectivity-to-azure)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/gnw-1/gnw-1.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### GNW-2 - Simulate a failure path to ensure that connectivity is available over alternative paths
-
-**Category: Networking**
-
-**Impact: High**
-
-**Guidance**
-
-The failure of a connection path onto other connection paths should be tested to validate connectivity and operational effectiveness. Using site-to-site VPN connectivity as a backup path for ExpressRoute provides an extra layer of network resiliency for cross-premises connectivity.
-
-**Resources**
-
-- [Design requirements connectivity](https://learn.microsoft.com/en-us/azure/well-architected/resiliency/design-requirements#connectivity)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/gnw-2/gnw-2.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### GNW-3 - Use a global load balancer to distribute traffic and failover across regions
-
-**Category: Networking**
-
-**Impact: Medium**
-
-**Guidance**
-
-Azure Front Door, Azure Traffic Manager, or third-party content delivery network services can be used to direct inbound requests to external-facing application endpoints deployed across multiple regions. Traffic Manager is a DNS-based load balancer, so failover must wait for DNS propagation to occur. A sufficiently low time-to-live (TTL) value should be used for DNS records, though not all ISPs honor this setting. For application scenarios that require transparent failover, Azure Front Door should be used.
-
-**Resources**
-
-- [Design requirements connectivity](https://learn.microsoft.com/en-us/azure/well-architected/resiliency/design-requirements#connectivity)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/gnw-3/gnw-3.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### GNW-4 - Eliminate all single points of failure from the data path both on-premises and hosted on Azure
-
-**Category: Networking**
-
-**Impact: High**
-
-**Guidance**
-
-Single-instance Network Virtual Appliances (NVAs) introduce significant connectivity risk, whether deployed in Azure or within an on-premises datacenter.
-
-**Resources**
-
-- [Design requirements connectivity](https://learn.microsoft.com/en-us/azure/well-architected/resiliency/design-requirements#connectivity)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/gnw-4/gnw-4.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### GNW-5 - Assess critical application dependencies with health probes
-
-**Category: Networking**
-
-**Impact: Medium**
-
-**Guidance**
-
-Custom health probes should be used to assess overall application health including downstream components and dependent services, such as APIs and datastores. In this approach, traffic isn't sent to backend instances that can't successfully process requests due to dependency failures.
-
-**Resources**
-
-- [Design requirements connectivity](https://learn.microsoft.com/en-us/azure/well-architected/resiliency/design-requirements#connectivity)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/gnw-5/gnw-5.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
diff --git a/docs/content/services/networking/load-balancer/_index.md b/docs/content/services/networking/load-balancer/_index.md
index ea83eef82..a3164bb2c 100644
--- a/docs/content/services/networking/load-balancer/_index.md
+++ b/docs/content/services/networking/load-balancer/_index.md
@@ -14,12 +14,12 @@ The presented resiliency recommendations in this guidance include Load Balancer
The below table shows the list of resiliency recommendations for Load Balancer and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [LB-1 - Use Standard Load Balancer SKU](#lb-1---use-standard-load-balancer-sku) | High | Preview | Yes |
-| [LB-2 - Ensure the Backend Pool contains at least two instances](#lb-2---ensure-the-backend-pool-contains-at-least-two-instances) | High | Preview | Yes |
-| [LB-3 - Use NAT Gateway instead of Outbound Rules for Production Workloads](#lb-3---use-nat-gateway-instead-of-outbound-rules-for-production-workloads) | Medium | Preview | Yes |
-| [LB-4 - Ensure Standard Load Balancer is zone-redundant](#lb-4---ensure-standard-load-balancer-is-zone-redundant) | High | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:------------:|:------:|:-------:|:-------------------:|
+| [LB-1 - Use Standard Load Balancer SKU](#lb-1---use-standard-load-balancer-sku) | Availability | High | Verified | Yes |
+| [LB-2 - Ensure the Backend Pool contains at least two instances](#lb-2---ensure-the-backend-pool-contains-at-least-two-instances) | Availability | High | Verified | Yes |
+| [LB-3 - Use NAT Gateway instead of Outbound Rules for Production Workloads](#lb-3---use-nat-gateway-instead-of-outbound-rules-for-production-workloads) | Availability | Medium | Verified | Yes |
+| [LB-4 - Ensure Standard Load Balancer is zone-redundant](#lb-4---ensure-standard-load-balancer-is-zone-redundant) | Availability | High | Verified | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -32,6 +32,8 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### LB-1 - Use Standard Load Balancer SKU
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -43,7 +45,7 @@ Select Standard SKU Standard Load Balancer provides a dimension of reliability t
- [Reliability and Azure Load Balancer](https://learn.microsoft.com/azure/architecture/framework/services/networking/azure-load-balancer/reliability)
- [Resiliency checklist for specific Azure services- Azure Load Balancer](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#azure-load-balancer)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -55,6 +57,8 @@ Select Standard SKU Standard Load Balancer provides a dimension of reliability t
### LB-2 - Ensure the Backend Pool contains at least two instances
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -65,7 +69,7 @@ Select Standard SKU Standard Load Balancer provides a dimension of reliability t
- [Resiliency checklist for specific Azure services- Azure Load Balancer](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#azure-load-balancer)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -77,17 +81,19 @@ Select Standard SKU Standard Load Balancer provides a dimension of reliability t
### LB-3 - Use NAT Gateway instead of Outbound Rules for Production Workloads
+**Category: Availability**
+
**Impact: Medium**
**Guidance**
-Outbound rules ensure that you are not faced with connection failures as a result of SNAT port exhaustion. While outbound rules will help improve the solution for small to mid size deployments, for production workloads, we recommend coupling Standard Load Balancer or any subnet deployment with VNet NAT.
+Outbound rules for Standard Public Load Balancer requires you to manually allocate fixed amounts of ports to each of your backend pool instances. Because the SNAT port allocation is fixed, outbound rules does not provide the most scalable method for outbound connectivity. For production workloads, we recommend using NAT Gateway instead in order to prevent the risk of connection failures due to SNAT port exhaustion. NAT Gateway scales dynamically and provides secure connectivity to the internet.
**Resources**
- [Resiliency checklist for specific Azure services- Azure Load Balancer](https://learn.microsoft.com/azure/architecture/checklist/resiliency-per-service#azure-load-balancer)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -99,17 +105,19 @@ Outbound rules ensure that you are not faced with connection failures as a resul
### LB-4 - Ensure Standard Load Balancer is zone-redundant
+**Category: Availability**
+
**Impact: High**
**Guidance**
- In a region with Availability Zones, a Standard Load Balancer can be zone-redundant with traffic served by a single IP address. A single frontend IP address survives zone failure. The frontend IP may be used to reach all (non-impacted) backend pool members no matter the zone. Up to one availability zone can fail and the data path survives as long as the remaining zones in the region remain healthy.
+In a region with Availability Zones, a Standard Load Balancer can be made zone-redundant by assigning it with a zone-redundant frontend IP address. With a zone-redundant frontend IP, the load balancer will continue to distribute traffic even when one availability zone fails, as long as there are other healthy zones and corresponding healthy backend instances in these zones that can receive traffic.
**Resources**
- [Load Balancer and Availability Zones](https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-standard-availability-zones#zone-redundant)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/networking/load-balancer/code/lb-1/lb-1.kql b/docs/content/services/networking/load-balancer/code/lb-1/lb-1.kql
index 14bdf2639..6f40bb8e8 100644
--- a/docs/content/services/networking/load-balancer/code/lb-1/lb-1.kql
+++ b/docs/content/services/networking/load-balancer/code/lb-1/lb-1.kql
@@ -3,4 +3,4 @@
resources
| where type =~ 'Microsoft.Network/loadBalancers'
| where sku.name == 'Basic'
-| project recommendationId = "lb-1", name, id, Param1=strcat("sku-tier: basic")
+| project recommendationId = "lb-1", name, id, tags, Param1=strcat("sku-tier: basic")
diff --git a/docs/content/services/networking/load-balancer/code/lb-2/lb-2.kql b/docs/content/services/networking/load-balancer/code/lb-2/lb-2.kql
index e36a3bcc6..2e215a9cb 100644
--- a/docs/content/services/networking/load-balancer/code/lb-2/lb-2.kql
+++ b/docs/content/services/networking/load-balancer/code/lb-2/lb-2.kql
@@ -5,7 +5,7 @@ resources
| extend bep = properties.backendAddressPools
| extend BackEndPools = array_length(bep)
| where BackEndPools == 0
-| project recommendationId = "lb-2", name, id, Param1=BackEndPools, Param2=0
+| project recommendationId = "lb-2", name, id, Param1=BackEndPools, Param2=0, tags
| union (resources
| where type =~ 'Microsoft.Network/loadBalancers'
| extend bep = properties.backendAddressPools
@@ -13,4 +13,4 @@ resources
| mv-expand bip = properties.backendAddressPools
| extend BackendAddresses = array_length(bip.properties.loadBalancerBackendAddresses)
| where BackendAddresses <= 1
- | project recommendationId = "lb-2", name, id, Param1=BackEndPools, Param2=BackendAddresses)
+ | project recommendationId = "lb-2", name, id, tags, Param1=BackEndPools, Param2=BackendAddresses)
diff --git a/docs/content/services/networking/load-balancer/code/lb-3/lb-3.kql b/docs/content/services/networking/load-balancer/code/lb-3/lb-3.kql
index 3c96df6f2..ae8a4c1aa 100644
--- a/docs/content/services/networking/load-balancer/code/lb-3/lb-3.kql
+++ b/docs/content/services/networking/load-balancer/code/lb-3/lb-3.kql
@@ -4,4 +4,4 @@ resources
| where type =~ 'Microsoft.Network/loadBalancers'
| extend outboundRules = array_length(properties.outboundRules)
| where outboundRules > 0
-| project recommendationId = "lb-3", name, id, Param1 = "outboundRules: >=1"
+| project recommendationId = "lb-3", name, id, tags, Param1 = "outboundRules: >=1"
diff --git a/docs/content/services/networking/load-balancer/code/lb-4/lb-4.kql b/docs/content/services/networking/load-balancer/code/lb-4/lb-4.kql
index fc4443676..e34d31268 100644
--- a/docs/content/services/networking/load-balancer/code/lb-4/lb-4.kql
+++ b/docs/content/services/networking/load-balancer/code/lb-4/lb-4.kql
@@ -30,4 +30,4 @@ resources
LBid = toupper(substring(properties.ipConfiguration.id, 0, indexof(properties.ipConfiguration.id, '/frontendIPConfigurations'))),
InnerID = toupper(id)
) on $left.PIPid == $right.InnerID)
-| project recommendationId = "lb-4", name, id, param1="Zones: No Zone or Zonal", param2=strcat("Frontend IP Configuration:", " ", feConfigName)
+| project recommendationId = "lb-4", name, id, tags, param1="Zones: No Zone or Zonal", param2=strcat("Frontend IP Configuration:", " ", feConfigName)
diff --git a/docs/content/services/networking/network-security-group/_index.md b/docs/content/services/networking/network-security-group/_index.md
index f0daedc5f..685458f93 100644
--- a/docs/content/services/networking/network-security-group/_index.md
+++ b/docs/content/services/networking/network-security-group/_index.md
@@ -43,7 +43,7 @@ Resource Logs are not collected and stored until you create a diagnostic setting
- [Diagnostic settings in Azure Monitor](https://learn.microsoft.com/azure/azure-monitor/essentials/diagnostic-settings)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -51,12 +51,6 @@ Resource Logs are not collected and stored until you create a diagnostic setting
{{< /collapse >}}
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="powershell" file="code/nsg-1/nsg-1.ps1" >}} {{< /code >}}
-
-{{< /collapse >}}
-
### NSG-2 - Monitor changes in Network Security Groups with Azure Monitor
@@ -73,7 +67,7 @@ Create Alerts for administrative operations such as Create or Update Network Sec
- [Azure Monitor activity log](https://learn.microsoft.com/azure/azure-monitor/essentials/activity-log?tabs=powershell)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -87,7 +81,7 @@ Create Alerts for administrative operations such as Create or Update Network Sec
**Category: Governance**
-**Impact: Medium**
+**Impact: Low**
**Guidance**
@@ -98,7 +92,7 @@ You can set locks that prevent either deletions or modifications. In the portal,
- [Lock your resources to protect your infrastructure](https://learn.microsoft.com/azure/azure-resource-manager/management/lock-resources?toc=%2Fazure%2Fvirtual-network%2Ftoc.json&tabs=json)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -106,12 +100,6 @@ You can set locks that prevent either deletions or modifications. In the portal,
{{< /collapse >}}
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="powershell" file="code/nsg-3/nsg-3.ps1" >}} {{< /code >}}
-
-{{< /collapse >}}
-
### NSG-4 - Configure NSG Flow Logs
@@ -130,7 +118,7 @@ Flow logs are the source of truth for all network activity in your cloud environ
- [Flow logging for network security groups](https://learn.microsoft.com/azure/network-watcher/network-watcher-nsg-flow-logging-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -154,7 +142,7 @@ You can use an Azure network security group to filter network traffic between Az
- [Security rules](https://learn.microsoft.com/azure/virtual-network/network-security-groups-overview#security-rules)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/networking/network-watcher/_index.md b/docs/content/services/networking/network-watcher/_index.md
index 28afdb5f9..108a2a145 100644
--- a/docs/content/services/networking/network-watcher/_index.md
+++ b/docs/content/services/networking/network-watcher/_index.md
@@ -40,7 +40,7 @@ Azure Network Watcher provides a suite of tools to monitor, diagnose, view metri
- [What is Azure Network Watcher?](https://learn.microsoft.com/azure/network-watcher/network-watcher-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -64,7 +64,7 @@ Network security group flow logging is a feature of Azure Network Watcher that a
- [Manage NSG flow logs using the Azure portal](https://learn.microsoft.com/azure/network-watcher/nsg-flow-logging)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/networking/network-watcher/code/nw-1/nw-1.kql b/docs/content/services/networking/network-watcher/code/nw-1/nw-1.kql
index 8e9f12d87..c2e95fab2 100644
--- a/docs/content/services/networking/network-watcher/code/nw-1/nw-1.kql
+++ b/docs/content/services/networking/network-watcher/code/nw-1/nw-1.kql
@@ -6,4 +6,4 @@ resources
| where type =~ "microsoft.network/networkwatchers")
| summarize NetworkWatcherCount = countif(type =~ 'Microsoft.Network/networkWatchers') by location
| where NetworkWatcherCount == 0
-| project recommendationId = "nw-1", name=location, id="n/a", param1 = strcat("LocationMisingNetworkWatcher:", location)
+| project recommendationId = "nw-1", name=location, id="n/a", tags, param1 = strcat("LocationMisingNetworkWatcher:", location)
diff --git a/docs/content/services/networking/network-watcher/code/nw-2/nw-2.kql b/docs/content/services/networking/network-watcher/code/nw-2/nw-2.kql
index 0cbbf0556..45db8367c 100644
--- a/docs/content/services/networking/network-watcher/code/nw-2/nw-2.kql
+++ b/docs/content/services/networking/network-watcher/code/nw-2/nw-2.kql
@@ -7,4 +7,4 @@ resources
| extend provisioningState = tostring(properties.provisioningState)
| extend flowLogType = iff(properties.targetResourceId contains "Microsoft.Network/virtualNetworks", 'Virtual network', 'Network security group')
| where provisioningState != "Succeeded" or status != "Enabled"
-| project recommendationId = "nw-2", name, id, param1 = strcat("provisioningState:", provisioningState), param2=strcat("Status:", status), param3=strcat("targetResourceId:",targetResourceId), param4=strcat("flowLogType:",flowLogType)
+| project recommendationId = "nw-2", name, id, tags, param1 = strcat("provisioningState:", provisioningState), param2=strcat("Status:", status), param3=strcat("targetResourceId:",targetResourceId), param4=strcat("flowLogType:",flowLogType)
diff --git a/docs/content/services/networking/private-dns-zones/_index.md b/docs/content/services/networking/private-dns-zones/_index.md
new file mode 100644
index 000000000..673e67443
--- /dev/null
+++ b/docs/content/services/networking/private-dns-zones/_index.md
@@ -0,0 +1,101 @@
++++
+title = "Private DNS Zones"
+description = "Best practices and resiliency recommendations for Private DNS Zones and associated resources and settings."
+date = "3/7/24"
+author = "rodrigosantosms"
+msAuthor = "rodrigosantosms"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Private DNS Zones and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [PVDNSZ-1 - Protect private DNS zones and records](#pvdnsz-1---protect-private-dns-zones-and-records) | Access & Security | Medium | Preview | No |
+| [PVDNSZ-2 - Monitor Private DNS Zones health and set up alerts](#pvdnsz-2---monitor-private-dns-zones-health-and-set-up-alerts) | Monitoring | Low | Preview | No |
+| [PVDNSZ-3 - Make sure Production and DR zones have equivalent entries for workloads and resources that will be failed over](#pvdnsz-3---make-sure-production-and-dr-zones-have-equivalent-entries-for-workloads-and-resources-that-will-be-failed-over) | Governance | Medium | Preview | No |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### PVDNSZ-1 - Protect private DNS zones and records
+
+**Category: Access & Security**
+
+**Impact: Medium**
+
+**Guidance**
+
+Private DNS zones and records are critical resources. Deleting a DNS zone or a single DNS record can result in a service outage. It's important that DNS zones and records are protected against unauthorized or accidental changes. The Private DNS Zone Contributor role is a built-in role for managing private DNS resources. This role applied to a user or group enables them to manage private DNS resources.
+
+**Resources**
+
+- [Protecting private DNS Zones and Records - Azure DNS](https://learn.microsoft.com/en-us/azure/dns/dns-protect-private-zones-recordsets)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/pvdnsz-1/pvdnsz-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### PVDNSZ-2 - Monitor Private DNS Zones health and set up alerts
+
+**Category: Monitoring**
+
+**Impact: Low**
+
+**Guidance**
+
+The records contained in a private DNS zone aren't resolvable from the Internet. DNS resolution against a private DNS zone works only from virtual networks that are linked to it. You can link a private DNS zone to one or more virtual networks by creating virtual network links. You can also enable the autoregistration feature to automatically manage the life cycle of the DNS records for the virtual machines that get deployed in a virtual network.
+
+**Resources**
+
+- [Scenarios for Azure Private DNS zones](https://learn.microsoft.com/en-us/azure/dns/private-dns-scenarios)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/pvdnsz-2/pvdnsz-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### PVDNSZ-3 - Make sure Production and DR zones have equivalent entries for workloads and resources that will be failed over
+
+**Category: Governance**
+
+**Impact: Medium**
+
+**Guidance**
+
+Azure Private DNS provides a reliable, secure DNS service to manage and resolve domain names in a virtual network without the need to add a custom DNS solution. By using private DNS zones, you can use your own custom domain names rather than the Azure-provided names available today. The records contained in a private DNS zone aren't resolvable from the Internet. DNS resolution against a private DNS zone works only from virtual networks that are linked to it. You can link a private DNS zone to one or more virtual networks by creating virtual network links. You can also enable the autoregistration feature to automatically manage the life cycle of the DNS records for the virtual machines that get deployed in a virtual network.
+
+**Resources**
+
+- [Scenarios for Azure Private DNS zones](https://learn.microsoft.com/en-us/azure/dns/private-dns-scenarios)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/pvdnsz-3/pvdnsz-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/networking/private-dns-zones/code/pvdnsz-1/pvdnsz-1.kql b/docs/content/services/networking/private-dns-zones/code/pvdnsz-1/pvdnsz-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/private-dns-zones/code/pvdnsz-1/pvdnsz-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/private-dns-zones/code/pvdnsz-2/pvdnsz-2.kql b/docs/content/services/networking/private-dns-zones/code/pvdnsz-2/pvdnsz-2.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/private-dns-zones/code/pvdnsz-2/pvdnsz-2.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/private-dns-zones/code/pvdnsz-3/pvdnsz-3.kql b/docs/content/services/networking/private-dns-zones/code/pvdnsz-3/pvdnsz-3.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/private-dns-zones/code/pvdnsz-3/pvdnsz-3.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/private-endpoints/_index.md b/docs/content/services/networking/private-endpoints/_index.md
index 8ba22df7d..65ff08679 100644
--- a/docs/content/services/networking/private-endpoints/_index.md
+++ b/docs/content/services/networking/private-endpoints/_index.md
@@ -39,7 +39,7 @@ A private endpoint has two custom properties, static IP address and the network
- [Private endpoint connections](https://learn.microsoft.com/azure/private-link/manage-private-endpoint?tabs=manage-private-link-powershell#private-endpoint-connections)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/networking/private-endpoints/code/pep-1/pep-1.kql b/docs/content/services/networking/private-endpoints/code/pep-1/pep-1.kql
index b42b79e7a..1bae6862e 100644
--- a/docs/content/services/networking/private-endpoints/code/pep-1/pep-1.kql
+++ b/docs/content/services/networking/private-endpoints/code/pep-1/pep-1.kql
@@ -3,4 +3,4 @@
resources
| where type =~ "microsoft.network/privateendpoints"
| where properties.provisioningState != "Succeeded" or properties.privateLinkServiceConnections[0].properties.provisioningState != "Succeeded"
-| project recommendationId = "pep-1", name, id, param1 = strcat("provisioningState: ", tostring(properties.provisioningState)), param2 = strcat("provisioningState: ", tostring(properties.privateLinkServiceConnections[0].properties.provisioningState))
+| project recommendationId = "pep-1", name, id, tags, param1 = strcat("provisioningState: ", tostring(properties.provisioningState)), param2 = strcat("provisioningState: ", tostring(properties.privateLinkServiceConnections[0].properties.provisioningState))
diff --git a/docs/content/services/networking/public-ip/_index.md b/docs/content/services/networking/public-ip/_index.md
index e90a0aa0e..ade008867 100644
--- a/docs/content/services/networking/public-ip/_index.md
+++ b/docs/content/services/networking/public-ip/_index.md
@@ -14,10 +14,11 @@ The presented resiliency recommendations in this guidance include Public Ip and
The below table shows the list of resiliency recommendations for Public Ip and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :-----------------: |
-| [PIP-1 - Use Standard SKU](#pip-1---use-standard-sku) | Preview | No |
-| [PIP-2 - Use NAT gateway for outbound connectivity to avoid SNAT Exhaustion](#pip-2---use-nat-gateway-for-outbound-connectivity-to-avoid-snat-exhaustion) | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------|:------------:|:------:|:-------:|:-------------------:|
+| [PIP-1 - Use Zone-Redundant IPs when applicable](#pip-1---use-standard-sku-and-zone-redundant-ips-when-applicable) | Availability | High | Preview | Yes |
+| [PIP-2 - Use NAT gateway for outbound connectivity to avoid SNAT Exhaustion](#pip-2---use-nat-gateway-for-outbound-connectivity-to-avoid-snat-exhaustion) | Availability | Medium | Preview | Yes |
+| [PIP-3 - Upgrade Basic SKU public IP addresses to Standard SKU](#pip-3---upgrade-basic-sku-public-ip-addresses-to-standard-sku) | Availability | Medium | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -28,22 +29,23 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
## Recommendations Details
-### PIP-1 - Use Standard SKU
+### PIP-1 - Use Standard SKU and Zone-Redundant IPs when applicable
+
+**Category: Availability**
**Impact: High**
**Guidance**
Public IP addresses with a standard SKU can be created as non-zonal, zonal, or zone-redundant in regions that support availability zones.
-A zone-redundant IP is created in all zones for a region and can survive any single zone failure. A zonal IP is tied to a specific availability zone, and shares fate with the health of the zone. A "non-zonal" public IP addresses are placed into a zone for you by Azure and doesn't give a guarantee of redundancy.
-In regions without availability zones, all public IP addresses are created as non-zonal. Public IP addresses created in a region that is later upgraded to have availability zones remain non-zonal. A public IP's availability zone can't be changed after the public IP's creation.
-Note - All basic SKU public IP addresses are created as non-zonal. Any IP that is upgraded from a basic SKU to standard SKU remains non-zonal.
+A zone-redundant IP is created in all zones for a region and can survive any single zone failure. A zonal IP is tied to a specific availability zone, and shares fate with the health of the zone. A "non-zonal" public IP address is placed into a zone for you by Azure and doesn't give a guarantee of redundancy. When utilizing a Public IP with resources that support zone resiliency (such as an Azure Load Balancer or Azure Firewall), it is recommended to use zone-redundant IPs in most cases.
**Resources**
- [Public IP addresses - Availability Zones](https://learn.microsoft.com/azure/virtual-network/ip-services/public-ip-addresses#availability-zone)
+- [Upgrading a basic public IP address to Standard SKU](https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/public-ip-basic-upgrade-guidance#steps-to-complete-the-upgrade)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -55,18 +57,20 @@ Note - All basic SKU public IP addresses are created as non-zonal. Any IP that i
### PIP-2 - Use NAT gateway for outbound connectivity to avoid SNAT Exhaustion
+**Category: Availability**
+
**Impact: Medium**
**Guidance**
-Prevent risk of connectivity failures due to SNAT port exhaustion by using NAT gateway for outbound traffic from your virtual networks. NAT gateway scales dynamically and provides secure connections for traffic headed to the internet. We don't recommend exceeding 100 simultaneous outbound connections to a public IP address per worker. Avoid communicating with downstream services through public IP addresses when a private address (Private Endpoint) or Service Endpoint through vNet Integration could be used.
+Prevent risk of connectivity failures due to SNAT port exhaustion by using NAT gateway for outbound traffic from your virtual networks. NAT gateway scales dynamically and provides secure connections for traffic headed to the internet.
**Resources**
- [Use NAT GW for outbound connectivity](https://learn.microsoft.com/azure/advisor/advisor-reference-reliability-recommendations#use-nat-gateway-for-outbound-connectivity)
- [TCP and SNAT Ports](https://learn.microsoft.com/azure/architecture/framework/services/compute/azure-app-service/reliability#tcp-and-snat-ports)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -75,3 +79,28 @@ Prevent risk of connectivity failures due to SNAT port exhaustion by using NAT g
{{< /collapse >}}
+
+### PIP-3 - Upgrade Basic SKU public IP addresses to Standard SKU
+
+**Category: Availability**
+
+**Impact: Medium**
+
+**Guidance**
+
+On September 30, 2025, Basic SKU public IP addresses will be retired. If you are currently using Basic SKU public IP addresses, make sure to upgrade to Standard SKU public IP addresses prior to the retirement date.
+
+**Resources**
+
+- [Upgrading a basic public IP address to Standard SKU - Guidance](https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/public-ip-basic-upgrade-guidance)
+- [Upgrade to Standard SKU public IP addresses in Azure by 30 September 2025—Basic SKU will be retired](https://azure.microsoft.com/en-us/updates/upgrade-to-standard-sku-public-ip-addresses-in-azure-by-30-september-2025-basic-sku-will-be-retired/)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/pip-3/pip-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql b/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql
index 614a7f9ca..b7a882720 100644
--- a/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql
+++ b/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph query
+// List public IP addresses that are not Zone-Redundant
+Resources
+| where type =~ "Microsoft.Network/publicIPAddresses" and sku.tier =~ "Regional"
+| where isempty(zones) or array_length(zones) <= 1
+| extend az = case(isempty(zones), "Non-zonal", array_length(zones) <= 1, strcat("Zonal (", strcat_array(zones, ","), ")"), zones)
+| project recommendationId = "pip-1", name, id, tags, param1 = strcat("sku: ", sku.name), param2 = strcat("availabilityZone: ", az)
diff --git a/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql.fix b/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql.fix
deleted file mode 100644
index f20ff3fef..000000000
--- a/docs/content/services/networking/public-ip/code/pip-1/pip-1.kql.fix
+++ /dev/null
@@ -1,7 +0,0 @@
-// Azure Resource Graph query
-// Lists PIPs that are not Standard SKU
-resources
-| where type =~ 'Microsoft.Network/publicIPAddresses'
-| extend sku = tostring(sku.name)
-| where sku != 'Standard'
-| project recommendationid="pip-1", name, id, param1=strcat("sku=",sku)
diff --git a/docs/content/services/networking/public-ip/code/pip-2/pip-2.kql b/docs/content/services/networking/public-ip/code/pip-2/pip-2.kql
index 775de5a19..eb31ce7ba 100644
--- a/docs/content/services/networking/public-ip/code/pip-2/pip-2.kql
+++ b/docs/content/services/networking/public-ip/code/pip-2/pip-2.kql
@@ -3,4 +3,4 @@
resources
| where type =~ 'Microsoft.Network/publicIPAddresses'
| where tostring(properties.ipConfiguration.id) contains "microsoft.network/networkinterfaces"
-| project recommendationid="pip-2", name, id, param1=strcat("Migrate from instance IP to NAT Gateway")
+| project recommendationid="pip-2", name, id, tags, param1=strcat("Migrate from instance IP to NAT Gateway")
diff --git a/docs/content/services/networking/public-ip/code/pip-3/pip-3.kql b/docs/content/services/networking/public-ip/code/pip-3/pip-3.kql
new file mode 100644
index 000000000..7cc73054b
--- /dev/null
+++ b/docs/content/services/networking/public-ip/code/pip-3/pip-3.kql
@@ -0,0 +1,6 @@
+// Azure Resource Graph query
+// List Basic SKU public IP addresses
+Resources
+| where type =~ "Microsoft.Network/publicIPAddresses"
+| where sku.name =~ "Basic"
+| project recommendationId = "pip-3", name, id, tags, param1 = strcat("sku: ", sku.name)
diff --git a/docs/content/services/networking/route-table/_index.md b/docs/content/services/networking/route-table/_index.md
index aa6f898a4..2227284c1 100644
--- a/docs/content/services/networking/route-table/_index.md
+++ b/docs/content/services/networking/route-table/_index.md
@@ -40,7 +40,7 @@ Create Alerts for administrative operations such as Create or Update Route Table
- [Azure activity log - Azure Monitor | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/activity-log?tabs=powershell)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -48,12 +48,6 @@ Create Alerts for administrative operations such as Create or Update Route Table
{{< /collapse >}}
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="powershell" file="code/rt-1/rt-1.ps1" >}} {{< /code >}}
-
-{{< /collapse >}}
-
### RT-2 - Configure locks for Route Tables to avoid accidental changes or deletion
@@ -71,11 +65,11 @@ You can set locks that prevent either deletions or modifications. In the portal,
- [Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources?toc=%2Fazure%2Fvirtual-network%2Ftoc.json&tabs=json)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="powershell" file="code/rt-2/rt-2.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/rt-2/rt-2.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/networking/route-table/code/rt-1/rt-1.ps1 b/docs/content/services/networking/route-table/code/rt-1/rt-1.ps1
deleted file mode 100644
index 704b18fc7..000000000
--- a/docs/content/services/networking/route-table/code/rt-1/rt-1.ps1
+++ /dev/null
@@ -1,20 +0,0 @@
-#Pulls a list of all Route Tables without an alert configured for modifications.
-$NeedsActivityAlerts = @()
-$subscriptions = Get-azsubscription
-
-foreach ($subscription in $subscriptions){
- set-azcontext $subscription | Out-Null
- $RouteTables= Get-AzRouteTable
- $ActivityLogAlerts = Get-AzActivityLogAlert | where {$_.scope-match "routeTables"}
- $AlertsEnabled = @()
- foreach ($resource in $RouteTables){
- foreach($Alert in $ActivityLogAlerts){
- if($Alert.scope -match $resource.name){$AlertsEnabled+=$resource}
- }
- }
- foreach ($RT in $RouteTables){
- if($AlertsEnabled.name -notcontains $rt.name){$NeedsActivityAlerts+=$RT}
- }
-}
-
-$NeedsActivityAlerts | select Name, ResourceType, ResourceGroupName, Location, ID | format-table
diff --git a/docs/content/services/networking/route-table/code/rt-2/rt-2.kql b/docs/content/services/networking/route-table/code/rt-2/rt-2.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/networking/route-table/code/rt-2/rt-2.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/traffic-manager/_index.md b/docs/content/services/networking/traffic-manager/_index.md
index 0872c750a..a446a59c8 100644
--- a/docs/content/services/networking/traffic-manager/_index.md
+++ b/docs/content/services/networking/traffic-manager/_index.md
@@ -12,13 +12,12 @@ The presented resiliency recommendations in this guidance include Azure Traffic
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------------------------------------------------------------------------------------------------ | :------: | :------: | :-----------------: |
-| [TRAF-1 - Traffic Manager Monitor Status Should be Online](#traf-1---traffic-manager-monitor-status-should-be-online) | High | Preview | No |
-| [TRAF-2 - Traffic manager profiles should have more than one endpoint](#traf-2---traffic-manager-profiles-should-have-more-than-one-endpoint) | High | Preview | No |
-| [TRAF-3 - Configure at least one endpoint within a another region](#traf-3---configure-at-least-one-endpoint-within-a-another-region) | Medium | Preview | No |
-| [TRAF-4 - TTL value of user profiles should be in 60 Seconds](#traf-4---ttl-value-of-user-profiles-should-be-in-60-seconds) | Medium | Preview | No |
-| [TRAF-5 - Ensure endpoint configured to "(All World)" for geographic profiles](#traf-5---ensure-endpoint-configured-to-all-world-for-geographic-profiles) | Medium | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [TRAF-1 - Traffic Manager Monitor Status Should be Online](#traf-1---traffic-manager-monitor-status-should-be-online) | Availability | High | Preview | Yes |
+| [TRAF-2 - Traffic manager profiles should have more than one endpoint](#traf-2---traffic-manager-profiles-should-have-more-than-one-endpoint) | Availability | High | Preview | Yes |
+| [TRAF-3 - Configure at least one endpoint within a another region](#traf-3---configure-at-least-one-endpoint-within-a-another-region) | Disaster Recovery | Medium | Preview | No |
+| [TRAF-5 - Ensure endpoint configured to (All World) for geographic profiles](#traf-5---ensure-endpoint-configured-to-all-world-for-geographic-profiles) | Disaster Recovery | Medium | Preview | No |
{{< /table >}}
@@ -32,6 +31,8 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### TRAF-1 - Traffic Manager Monitor Status Should be Online
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -44,7 +45,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
- [Enable or disable health checks](https://learn.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#enable-or-disable-health-checks-preview)
- [Troubleshooting degraded state on Azure Traffic Manager](https://learn.microsoft.com/azure/traffic-manager/traffic-manager-troubleshooting-degraded)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
{{< code lang="sql" file="code/traf-1/traf-1.kql" >}} {{< /code >}}
@@ -53,6 +54,8 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### TRAF-2 - Traffic manager profiles should have more than one endpoint
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -63,7 +66,7 @@ When configuring the Azure traffic manager, you should provision minimum of two
- [Traffic Manager Endpoint Types](https://learn.microsoft.com/azure/traffic-manager/traffic-manager-endpoint-types)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -75,6 +78,8 @@ When configuring the Azure traffic manager, you should provision minimum of two
### TRAF-3 - Configure at least one endpoint within a another region
+**Category: Disaster Recovery**
+
**Impact: Medium**
**Guidance**
@@ -86,7 +91,7 @@ Profiles should have more than one endpoint to ensure availability if one of the
- [Reliability recommendations
](https://learn.microsoft.com/azure/advisor/advisor-reference-reliability-recommendations#add-at-least-one-more-endpoint-to-the-profile-preferably-in-another-azure-region)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -96,30 +101,9 @@ Profiles should have more than one endpoint to ensure availability if one of the
-### TRAF-4 - TTL value of user profiles should be in 60 Seconds
-
-**Impact: Medium**
-
-**Guidance**
-
-Time to Live (TTL) affects how recent of a response a client will get when it makes a request to Azure Traffic Manager. Reducing the TTL value means that the client will be routed to a functioning endpoint faster in the case of a failover. Configure your TTL to 60 seconds to route traffic to a health endpoint as quickly as possible.
-
-**Resources**
-
-- [Configure DNS Time to Live to 60 seconds).](https://learn.microsoft.com/azure/advisor/advisor-reference-performance-recommendations#configure-dns-time-to-live-to-60-seconds)
-- [Traffic Manager profile - ProfileTTL (Configure DNS Time to Live to 60 seconds).](https://aka.ms/Um3xr5)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/traf-4/traf-4.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
+### TRAF-5 - Ensure endpoint configured to (All World) for geographic profiles
-### TRAF-5 - Ensure endpoint configured to "(All World)" for geographic profiles
+**Category: Disaster Recovery**
**Impact: Medium**
@@ -132,7 +116,7 @@ For geographic routing, traffic is routed to endpoints based on defined regions.
- [Add an endpoint configured to "All (World)"](https://learn.microsoft.com/azure/advisor/advisor-reference-reliability-recommendations#add-an-endpoint-configured-to-all-world)
- [Traffic Manager profile - GeographicProfile (Add an endpoint configured to ""All (World)"").](https://aka.ms/Rf7vc5)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/networking/traffic-manager/code/traf-1/traf-1.kql b/docs/content/services/networking/traffic-manager/code/traf-1/traf-1.kql
index 614a7f9ca..4597108cb 100644
--- a/docs/content/services/networking/traffic-manager/code/traf-1/traf-1.kql
+++ b/docs/content/services/networking/traffic-manager/code/traf-1/traf-1.kql
@@ -1 +1,7 @@
-// under-development
+// Azure Resource Graph Query
+// Find traffic manager profiles that have an endpoint monitor status of not 'Online'
+resources
+| where type == "microsoft.network/trafficmanagerprofiles"
+| mv-expand properties.endpoints
+| where properties_endpoints.properties.endpointMonitorStatus != "Online"
+| project recommendationId = "traf-1", name, id, tags, param1 = strcat('Profile name: ',properties_endpoints.name), param2 = strcat('endpointMonitorStatus: ', properties_endpoints.properties.endpointMonitorStatus)
diff --git a/docs/content/services/networking/traffic-manager/code/traf-2/traf-2.kql b/docs/content/services/networking/traffic-manager/code/traf-2/traf-2.kql
index 614a7f9ca..5bea8ef85 100644
--- a/docs/content/services/networking/traffic-manager/code/traf-2/traf-2.kql
+++ b/docs/content/services/networking/traffic-manager/code/traf-2/traf-2.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Find traffic manager profiles that have less than 2 endpoints
+resources
+| where type == "microsoft.network/trafficmanagerprofiles"
+| where array_length(properties.endpoints) < 2
+| project recommendationId = "traf-2", name, id, tags, param1 = strcat('EndpointCount: ', array_length(properties.endpoints))
diff --git a/docs/content/services/networking/traffic-manager/code/traf-3/traf-3.kql b/docs/content/services/networking/traffic-manager/code/traf-3/traf-3.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/networking/traffic-manager/code/traf-3/traf-3.kql
+++ b/docs/content/services/networking/traffic-manager/code/traf-3/traf-3.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/networking/traffic-manager/code/traf-4/traf-4.kql b/docs/content/services/networking/traffic-manager/code/traf-4/traf-4.kql
index 614a7f9ca..f225d7b22 100644
--- a/docs/content/services/networking/traffic-manager/code/traf-4/traf-4.kql
+++ b/docs/content/services/networking/traffic-manager/code/traf-4/traf-4.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// Find traffic manager profiles that do not have dns TTL set to 60
+resources
+| where type == "microsoft.network/trafficmanagerprofiles"
+| where properties.dnsConfig.ttl != 60
+| project recommendationId = "traf-4", name, id, tags, param1 = strcat('TTL: ', properties.dnsConfig.ttl)
diff --git a/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql b/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql
index 3accf26d2..614a7f9ca 100644
--- a/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql
+++ b/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql
@@ -1,8 +1 @@
-Resources
-| where type == 'microsoft.network/trafficmanagerprofiles'
-| extend endpoints = properties.endpoints
-| mv-expand endpoint = endpoints
-| extend endpointName = endpoint.name
-| extend endpointLocation = endpoint.properties.endpointLocation
-| extend ttl = toint(properties.dnsConfig.ttl)
-| project recommendationId="traf-5", name, id, endpointName, properties.trafficRoutingMethod, endpointLocation,ttl,GeoMapping = tostring(endpoint.properties.geoMapping)
+// under-development
diff --git a/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql.fix b/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql.fix
index 614a7f9ca..3accf26d2 100644
--- a/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql.fix
+++ b/docs/content/services/networking/traffic-manager/code/traf-5/traf-5.kql.fix
@@ -1 +1,8 @@
-// under-development
+Resources
+| where type == 'microsoft.network/trafficmanagerprofiles'
+| extend endpoints = properties.endpoints
+| mv-expand endpoint = endpoints
+| extend endpointName = endpoint.name
+| extend endpointLocation = endpoint.properties.endpointLocation
+| extend ttl = toint(properties.dnsConfig.ttl)
+| project recommendationId="traf-5", name, id, endpointName, properties.trafficRoutingMethod, endpointLocation,ttl,GeoMapping = tostring(endpoint.properties.geoMapping)
diff --git a/docs/content/services/networking/virtual-networks/_index.md b/docs/content/services/networking/virtual-networks/_index.md
index 1f0193c64..5ad92025f 100644
--- a/docs/content/services/networking/virtual-networks/_index.md
+++ b/docs/content/services/networking/virtual-networks/_index.md
@@ -14,11 +14,11 @@ The presented resiliency recommendations in this guidance include Virtual Networ
The below table shows the list of resiliency recommendations for Virtual Networks and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :-----------------: |
-| [VNET-1 - All Subnets should have a Network Security Group associated](#vnet-1---all-subnets-should-have-a-network-security-group-associated) | Preview | Yes |
-| [VNET-2 - Use Azure DDoS Standard Protection Plans to protect all public endpoints hosted within customer Virtual Networks](#vnet-2---use-azure-ddos-standard-protection-plans-to-protect-all-public-endpoints-hosted-within-customer-virtual-networks) | Preview | Yes |
-| [VNET-3 - Use Private Link, when available, for shared Azure PaaS services](#vnet-3---when-available-use-private-endpoints-instead-of-service-endpoints-for-paas-services) | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [VNET-1 - All Subnets should have a Network Security Group associated](#vnet-1---all-subnets-should-have-a-network-security-group-associated) | Access & Security | High | Preview | Yes |
+| [VNET-2 - Use Azure DDoS Standard Protection Plans to protect all public endpoints hosted within customer Virtual Networks](#vnet-2---use-azure-ddos-standard-protection-plans-to-protect-all-public-endpoints-hosted-within-customer-virtual-networks) | Access & Security | High | Preview | Yes |
+| [VNET-3 - Use Private Link, when available, for shared Azure PaaS services](#vnet-3---when-available-use-private-endpoints-instead-of-service-endpoints-for-paas-services) | Access & Security | Medium | Preview | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -31,17 +31,22 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### VNET-1 - All Subnets should have a Network Security Group associated
+**Category: Access & Security**
+
**Impact: High**
**Guidance**
-Network security groups: Network security groups and application security groups can contain multiple inbound and outbound security rules that enable you to filter traffic to and from resources by source and destination IP address, port, and protocol. NSG's provide a security layer on Subnet level.
+Network security groups: Network security groups and application security groups can contain multiple inbound and outbound security rules that enable you to filter traffic to and from resources by source and destination IP address, port, and protocol. NSG's provide a security layer on Subnet level. Note that the following subnets are excluded(ignored) because applying NSG on these subnets is not supported: GatewaySubnet, AzureFirewallSubnet, AzureFirewallManagementSubnet, RouteServerSubnet.
**Resources**
- [Azure Virtual Network - Concepts and best practices | Microsoft Learn](https://learn.microsoft.com/azure/virtual-network/concepts-and-best-practices)
+- [GatewaySUbnet](https://learn.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-about-vpn-gateway-settings#gwsub)
+- [Can I associate a network security group (NSG) to the RouteServerSubnet?](https://learn.microsoft.com/en-us/azure/route-server/route-server-faq#can-i-associate-a-network-security-group-nsg-to-the-routeserversubnet)
+- [Are Network Security Groups (NSGs) supported on the AzureFirewallSubnet?](https://learn.microsoft.com/en-us/azure/firewall/firewall-faq#are-network-security-groups--nsgs--supported-on-the-azurefirewallsubnet)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -53,7 +58,9 @@ Network security groups: Network security groups and application security groups
### VNET-2 - Use Azure DDoS Standard Protection Plans to protect all public endpoints hosted within customer Virtual Networks
-**Impact: Medium**
+**Category: Access & Security**
+
+**Impact: High**
**Guidance**
@@ -63,7 +70,7 @@ Azure DDoS Protection, combined with application design best practices, provides
- [Reliability and Azure Virtual Network - Microsoft Azure Well-Architected Framework | Microsoft Learn](https://learn.microsoft.com/azure/architecture/framework/services/networking/azure-virtual-network/reliability)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -75,6 +82,8 @@ Azure DDoS Protection, combined with application design best practices, provides
### VNET-3 - When available, use Private Endpoints instead of Service Endpoints for PaaS Services
+**Category: Access & Security**
+
**Impact: Medium**
**Guidance**
@@ -85,8 +94,9 @@ Use virtual network service endpoints only when Private Link isn't available and
- [Azure Virtual Network FAQ | Microsoft Learn](https://learn.microsoft.com/azure/virtual-network/virtual-networks-faq)
- [Reliability and Network connectivity - Microsoft Azure Well-Architected Framework | Microsoft LearnNetworking Reliability](https://learn.microsoft.com/azure/architecture/framework/services/networking/network-connectivity/reliability)
+- [Azure Private Link availability](https://learn.microsoft.com/en-us/azure/private-link/availability)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/networking/virtual-networks/code/vnet-1/vnet-1.kql b/docs/content/services/networking/virtual-networks/code/vnet-1/vnet-1.kql
index 8a68e045b..2f45d84bd 100644
--- a/docs/content/services/networking/virtual-networks/code/vnet-1/vnet-1.kql
+++ b/docs/content/services/networking/virtual-networks/code/vnet-1/vnet-1.kql
@@ -4,5 +4,5 @@ resources
| where type =~ 'Microsoft.Network/virtualnetworks'
| mv-expand subnets = properties.subnets
| extend sn = string_size(subnets.properties.networkSecurityGroup)
-| where sn == 0
+| where sn == 0 and subnets.name !in ("GatewaySubnet", "AzureFirewallSubnet", "AzureFirewallManagementSubnet", "RouteServerSubnet")
| project recommendationId = "vnet-1", name, id, tags, param1 = strcat("SubnetName: ", subnets.name), param2 = "NSG: False"
diff --git a/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql b/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql
index 614a7f9ca..4b80fa3df 100644
--- a/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql
+++ b/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql
@@ -1 +1,12 @@
-// under-development
+// Azure Resource Graph Query
+// Find Subnets with Service Endpoint enabled for services that offer Private Link
+resources
+| where type =~ 'Microsoft.Network/virtualnetworks'
+| mv-expand subnets = properties.subnets
+| extend se = array_length(subnets.properties.serviceEndpoints)
+| where se >= 1
+| project name, id, tags, subnets, serviceEndpoints=todynamic(subnets.properties.serviceEndpoints)
+| mv-expand serviceEndpoints
+| project name, id, tags, subnetName=subnets.name, serviceName=tostring(serviceEndpoints.service)
+| where serviceName in (parse_json('["Microsoft.CognitiveServices","Microsoft.AzureCosmosDB","Microsoft.DBforMariaDB","Microsoft.DBforMySQL","Microsoft.DBforPostgreSQL","Microsoft.EventHub","Microsoft.KeyVault","Microsoft.ServiceBus","Microsoft.Sql", "Microsoft.Storage","Microsoft.StorageSync","Microsoft.Synapse","Microsoft.Web"]'))
+| project recommendationId = "vnet-3", name, id, tags, param1 = strcat("subnet=", subnetName), param2=strcat("serviceName=",serviceName), param3="ServiceEndpoints=true"
diff --git a/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql.fix b/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql.fix
deleted file mode 100644
index a992216c9..000000000
--- a/docs/content/services/networking/virtual-networks/code/vnet-3/vnet-3.kql.fix
+++ /dev/null
@@ -1,8 +0,0 @@
-// Azure Resource Graph Query
-// Find Subnets with Service Endpoint enabled
-resources
-| where type =~ 'Microsoft.Network/virtualnetworks'
-| mv-expand subnets = properties.subnets
-| extend se = string_size(subnets.properties.serviceEndpoints)
-| where se >= 1
-| project recommendationId = "vnet-3", name, id, tags, subnets.name, Param1="ServiceEndpoints: true"
diff --git a/docs/content/services/networking/vpn-gateway/_index.md b/docs/content/services/networking/vpn-gateway/_index.md
index 8ab73708f..7ad016ccc 100644
--- a/docs/content/services/networking/vpn-gateway/_index.md
+++ b/docs/content/services/networking/vpn-gateway/_index.md
@@ -14,14 +14,15 @@ The presented resiliency recommendations in this guidance include VPN Gateway an
The below table shows the list of resiliency recommendations for VPN Gateway and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | State | ARG Query Available |
-| :-------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | :-----------------: |
-| [VPNG-1 - Choose a Zone-redundant gateway](#vpng-1---choose-a-zone-redundant-gateway) | Preview | Yes |
-| [VPNG-2 - Plan for Active-Active mode](#vpng-2---plan-for-active-active-mode) | Preview | Yes |
-| [VPNG-3 - Plan for Site-to-Site VPN and Azure ExpressRoute coexisting connection](#vpng-3---plan-for-site-to-site-vpn-and-azure-expressroute-coexisting-connection) | Preview | No |
-| [VPNG-4 - Plan for geo-redundant VPN Connections](#vpng-4---plan-for-geo-redundant-vpn-connections) | Preview | No |
-| [VPNG-5 - Monitor connections and gateway health](#vpng-5---monitor-connections-and-gateway-health) | Preview | No |
-| [VPNG-6 - Enable service health](#vpng-6---enable-service-health) | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [VPNG-1 - Choose a Zone-redundant gateway](#vpng-1---choose-a-zone-redundant-gateway) | Availability | High | Preview | Yes |
+| [VPNG-2 - Plan for Active-Active mode](#vpng-2---plan-for-active-active-mode) | Availability | High | Preview | Yes |
+| [VPNG-4 - Deploy active-active VPN concentrators on your premises for maximum resiliency](#vpng-4---deploy-active-active-vpn-concentrators-on-your-premises-for-maximum-resiliency) | Availability | High | Preview | No | | Availability | Medium | Preview | No |
+| [VPNG-5 - Monitor connections and gateway health](#vpng-5---monitor-connections-and-gateway-health) | Monitoring | Medium | Preview | No |
+| [VPNG-6 - Enable service health](#vpng-6---enable-service-health) | Monitoring | Medium | Preview | No |
+| [VPNG-7 - Deploy zone-redundant VPN Gateways with zone-redundant Public IP(s)](#vpng-7---deploy-zone-redundant-vpn-gateways-with-zone-redundant-public-ips) | Availability | Medium | Preview | Yes | | Availability | High | Preview | Yes |
+
{{< /table >}}
{{< alert style="info" >}}
@@ -34,19 +35,21 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### VPNG-1 - Choose a Zone-redundant gateway
+**Category: Availability**
+
**Impact: High**
**Guidance**
-Azure VPN gateway provides different SLAs when it's deployed in a single availability zone and when it's deployed in two or more availability zones. For information about all Azure SLAs, see [SLA summary for Azure services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1).
-To automatically deploy your virtual network gateways across availability zones, you can use zone-redundant virtual network gateways. With zone-redundant gateways, you can benefit from zone-resiliency to access your mission-critical, scalable services on Azure.
+Azure VPN gateway provides different SLAs when it's deployed in a single availability zone and when it's deployed in two availability zones. To automatically deploy your virtual network gateways across availability zones, you can use zone-redundant virtual network gateways. With zone-redundant gateways, you can benefit from zone-resiliency to access your mission-critical, scalable services on Azure.
**Resources**
- [Zone redundant Virtual network gateway in availability zone](https://learn.microsoft.com/azure/vpn-gateway/about-zone-redundant-vnet-gateways)
- [Gateway SKU](https://learn.microsoft.com/azure/vpn-gateway/about-zone-redundant-vnet-gateways#gwskus)
+- [SLA summary for Azure services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1).
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -58,11 +61,13 @@ To automatically deploy your virtual network gateways across availability zones,
### VPNG-2 - Plan for Active-Active mode
+**Category: Availability**
+
**Impact: High**
**Guidance**
-The active-active mode is available for all SKUs except Basic or Standard.
+The active-active mode is available for all SKUs except Basic.
Active-active gateways have two Gateway IP configurations and two public IP addresses.
**Resources**
@@ -70,7 +75,7 @@ Active-active gateways have two Gateway IP configurations and two public IP addr
- [Active-active VPN gateway](https://learn.microsoft.com/azure/vpn-gateway/active-active-portal#gateway)
- [Gateway SKU](https://learn.microsoft.com/azure/vpn-gateway/vpn-gateway-about-vpn-gateway-settings#gwsku)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -80,94 +85,101 @@ Active-active gateways have two Gateway IP configurations and two public IP addr
-### VPNG-3 - Plan for Site-to-Site VPN and Azure ExpressRoute coexisting connection
+### VPNG-4 - Deploy active-active VPN concentrators on your premises for maximum resiliency
+
+**Category: Availability**
**Impact: High**
**Guidance**
-During the initial planning phase, you want to decide whether you want to configure an ExpressRoute connection.
-An Azure ExpressRoute circuit provide a private dedicated connection into Azure.You also need to identify the bandwidth and the SKU type requirement for your business needs. Configure a Site-to-Site VPN as a failover path for ExpressRoute
+By deploying active-active VPN concentrators on your premises, along with active-active Azure VPN Gateways, you can maximize resilience and availability by using a fully-meshed topology based on four IPSec tunnels.
**Resources**
-- [Configure a Site-to-Site VPN as a failover path for ExpressRoute](https://learn.microsoft.com/azure/expressroute/expressroute-howto-coexist-resource-manager#configuration-designs)
-- [Limit and limitations](https://learn.microsoft.com/azure/expressroute/expressroute-howto-coexist-resource-manager#limits-and-limitations)
+- [Dual-redundancy: active-active VPN gateways for both Azure and on-premises networks](https://learn.microsoft.com/azure/vpn-gateway/vpn-gateway-highlyavailable#dual-redundancy-active-active-vpn-gateways-for-both-azure-and-on-premises-networks)
+
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/vpng-3/vpng-3.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/vpng-4/vpng-4.kql" >}} {{< /code >}}
{{< /collapse >}}
-### VPNG-4 - Plan for geo-redundant VPN connections
+### VPNG-5 - Monitor connections and gateway health
-**Impact: High**
+**Category: Monitoring**
+
+**Impact: Medium**
**Guidance**
-To plan for disaster recovery, set up Site-to-Site VPN in more than one location. You can create IP Sec connectivity in the same metro or different metro and choose to work with different service providers for diverse paths
+Set up monitoring and alerts for Virtual Network Gateway health based on various metrics available.
**Resources**
-- [Highly available cross-premises](https://learn.microsoft.com/azure/vpn-gateway/vpn-gateway-highlyavailable)
-- [About VPN gateway redundancy](https://learn.microsoft.com/azure/vpn-gateway/vpn-gateway-highlyavailable#about-vpn-gateway-redundancy)
+- [VPN gateway data reference](https://learn.microsoft.com/azure/vpn-gateway/monitor-vpn-gateway-reference)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/vpng-4/vpng-4.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/vpng-5/vpng-5.kql" >}} {{< /code >}}
{{< /collapse >}}
-### VPNG-5 - Monitor connections and gateway health
+### VPNG-6 - Enable service health
+
+**Category: Monitoring**
**Impact: Medium**
**Guidance**
-Set up monitoring and alerts for Virtual Network Gateway health based on various metrics available.
+VPN Gateway uses service health to notify about planned and unplanned maintenance. Configuring service health will notify you about changes made to your VPN connectivity.
**Resources**
-- [VPN gateway data reference](https://learn.microsoft.com/azure/vpn-gateway/monitor-vpn-gateway-reference)
+- [Getting started with Azure Metrics Explorer](hhttps://learn.microsoft.com/azure/azure-monitor/essentials/metrics-getting-started)
+- [Monitor VPN gateway](hhttps://learn.microsoft.com/azure/vpn-gateway/monitor-vpn-gateway-reference#metrics)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/vpng-5/vpng-5.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/vpng-6/vpng-6.kql" >}} {{< /code >}}
{{< /collapse >}}
-### VPNG-6 - Enable service health
+### VPNG-7 - Deploy zone-redundant VPN Gateways with zone-redundant Public IP(s)
-**Impact: Medium**
+**Category: Availability**
+
+**Impact: High**
**Guidance**
-VPN Gateway uses service health to notify about planned and unplanned maintenance. Configuring service health will notify you about changes made to your VPN connectivity.
+When using zone-redundant SKUs for VPN Gateways (VpnGw*AZ), make sure that you associate your gateway with zone-redundant Standard SKU public IP addresses. If a VPN gateway is associated with zonal Standard SKU public IP addresses, all the gateway instances are deployed in the same zone as the IP address(es). This recommendation applies to both active-passive gateways (which use a single public IP address) and active-active VPN gateways (which use two public IP addresses).
**Resources**
-- [Getting started with Azure Metrics Explorer](hhttps://learn.microsoft.com/azure/azure-monitor/essentials/metrics-getting-started)
-- [Monitor VPN gateway](hhttps://learn.microsoft.com/azure/vpn-gateway/monitor-vpn-gateway-reference#metrics)
+- [About zone-redundant virtual network gateway in Azure availability zones](https://learn.microsoft.com/azure/vpn-gateway/about-zone-redundant-vnet-gateways)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/vpng-6/vpng-6.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/vpng-7/vpng-7.kql" >}} {{< /code >}}
{{< /collapse >}}
+
diff --git a/docs/content/services/networking/vpn-gateway/code/vpng-1/vpng-1.kql b/docs/content/services/networking/vpn-gateway/code/vpng-1/vpng-1.kql
index 2638f59d7..53d8850cc 100644
--- a/docs/content/services/networking/vpn-gateway/code/vpng-1/vpng-1.kql
+++ b/docs/content/services/networking/vpn-gateway/code/vpng-1/vpng-1.kql
@@ -4,5 +4,5 @@ resources
| where type =~ "Microsoft.Network/virtualNetworkGateways"
| where properties.gatewayType == "Vpn"
| where properties.sku.tier !contains 'AZ'
-| project recommendationId = "vpng-1", name, id, param1= strcat("sku-tier: " , properties.sku.tier), param2=location
+| project recommendationId = "vpng-1", name, id, tags, param1= strcat("sku-tier: " , properties.sku.tier), param2=location
| order by id asc
diff --git a/docs/content/services/networking/vpn-gateway/code/vpng-2/vpng-2.kql b/docs/content/services/networking/vpn-gateway/code/vpng-2/vpng-2.kql
index 183c184d4..d60d725db 100644
--- a/docs/content/services/networking/vpn-gateway/code/vpng-2/vpng-2.kql
+++ b/docs/content/services/networking/vpn-gateway/code/vpng-2/vpng-2.kql
@@ -3,5 +3,5 @@ resources
| where properties.gatewayType =~ "vpn"
| extend gatewayType = properties.gatewayType, vpnType = properties.vpnType, connections = properties.connections, activeactive=properties.activeActive
| where activeactive == false
-| project recommendationId = "vpng-2", name, id
+| project recommendationId = "vpng-2", name, id, tags
diff --git a/docs/content/services/networking/vpn-gateway/code/vpng-7/vpng-7.kql b/docs/content/services/networking/vpn-gateway/code/vpng-7/vpng-7.kql
new file mode 100644
index 000000000..29a644362
--- /dev/null
+++ b/docs/content/services/networking/vpn-gateway/code/vpng-7/vpng-7.kql
@@ -0,0 +1,14 @@
+// Azure Resource Graph Query
+// Provides a list of zone-redundant Azure VPN gateways associated with non-zone-redundant Public IPs
+resources
+| where type =~ "Microsoft.Network/virtualNetworkGateways"
+| where properties.gatewayType == "Vpn"
+| where properties.sku.tier contains 'AZ'
+| mv-expand ipconfig = properties.ipConfigurations
+| extend pipId = tostring(ipconfig.properties.publicIPAddress.id)
+| join kind=inner (
+ resources
+ | where type == "microsoft.network/publicipaddresses"
+ | where isnull(zones) or array_length(zones) < 3 )
+ on $left.pipId == $right.id
+| project recommendationId = "vpng-7", name, id, tags, param1 = strcat("PublicIpAddressName: ", name1), param2 = strcat ("PublicIpAddressId: ",id1), param3 = strcat ("PublicIpAddressTags: ",tags1)
diff --git a/docs/content/services/networking/web-application-firewall/_index.md b/docs/content/services/networking/web-application-firewall/_index.md
index dfc2d98b1..423318970 100644
--- a/docs/content/services/networking/web-application-firewall/_index.md
+++ b/docs/content/services/networking/web-application-firewall/_index.md
@@ -12,13 +12,11 @@ The presented resiliency recommendations in this guidance include Web Applicatio
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----: | :-----: | :-----------------: |
-| [WAF-1 - Review best practice for Web Application Firewall on Azure Application Gateway](#waf-1---review-best-practice-for-web-application-firewall-on-azure-application-gateway) | Medium | Preview | No |
-| [WAF-2 - Review best practice for Web Application Firewall on Azure Front Door](#waf-2---review-best-practice-for-web-application-firewall-on-azure-front-door) | Medium | Preview | No |
-| [WAF-3 - Review logs for Web Application Firewall on Azure Front Door for legitimate requests that are blocked](#waf-3---review-logs-for-web-application-firewall-on-azure-front-door-for-legitimate-requests-that-are-blocked) | High | Preview | No |
-| [WAF-4 - Review logs for Web Application Firewall on Azure Application Gateway for legitimate requests that are blocked](#waf-4---review-logs-for-web-application-firewall-on-azure-application-gateway-for-legitimate-requests-that-are-blocked) | High | Preview | No |
-| [WAF-5 - Monitor Web Application Firewall](#waf-5---monitor-web-application-firewall) | Medium | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------:|:------:|:-------:|:-------------------:|
+| [WAF-1 - Review logs for Web Application Firewall on Azure Front Door for legitimate requests that are blocked](#waf-1---review-logs-for-web-application-firewall-on-azure-front-door-for-legitimate-requests-that-are-blocked) | Monitoring | Medium | Preview | No |
+| [WAF-2 - Review logs for Web Application Firewall on Azure Application Gateway for legitimate requests that are blocked](#waf-2---review-logs-for-web-application-firewall-on-azure-application-gateway-for-legitimate-requests-that-are-blocked) | Monitoring | Medium | Preview | No |
+| [WAF-3 - Monitor Web Application Firewall](#waf-3---monitor-web-application-firewall) | Monitoring | Medium | Preview | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -29,56 +27,14 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
## Recommendations Details
-### WAF-1 - Review best practice for Web Application Firewall on Azure Application Gateway
+### WAF-1 - Review logs for Web Application Firewall on Azure Front Door for legitimate requests that are blocked
-**Impact: Medium**
-
-**Guidance**
-
-Review and apply best practices for Web Application Firewall (WAF) on Azure Application Gateway.
-
-**Resources**
-
-- [Best practices for Web Application Firewall on Application Gateway](https://learn.microsoft.com/azure/web-application-firewall/ag/best-practices)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/waf-1/waf-1.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### WAF-2 - Review best practice for Web Application Firewall on Azure Front Door
+**Category: Monitoring**
**Impact: Medium**
**Guidance**
-Review and apply best practices for Web Application Firewall (WAF) on Azure Front Door.
-
-**Resources**
-
-- [Best practices for Web Application Firewall (WAF) on Azure Front Door](https://learn.microsoft.com/azure/web-application-firewall/afds/waf-front-door-best-practices)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/waf-2/waf-2.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### WAF-3 - Review logs for Web Application Firewall on Azure Front Door for legitimate requests that are blocked
-
-**Impact: High**
-
-**Guidance**
-
WAF could block a legitimate request that it shouldn't (a false positive). You can identify requests that have been blocked within the last 24 hours through Log Analytics.
**Resources**
@@ -88,7 +44,7 @@ WAF could block a legitimate request that it shouldn't (a false positive). You c
- [Web Application Firewall exclusion lists](https://learn.microsoft.com/azure/web-application-firewall/ag/application-gateway-waf-configuration?tabs=portal)
- [Fixing a false positive](https://learn.microsoft.com/azure/web-application-firewall/ag/web-application-firewall-troubleshoot#fixing-false-positives)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -98,9 +54,11 @@ WAF could block a legitimate request that it shouldn't (a false positive). You c
-### WAF-4 - Review logs for Web Application Firewall on Azure Application Gateway for legitimate requests that are blocked
+### WAF-2 - Review logs for Web Application Firewall on Azure Application Gateway for legitimate requests that are blocked
+
+**Category: Monitoring**
-**Impact: High**
+**Impact: Medium**
**Guidance**
@@ -111,7 +69,7 @@ WAF could block a legitimate request that it shouldn't (a false positive). You c
- [Azure Web Application Firewall Monitoring and Logging](https://learn.microsoft.com/azure/web-application-firewall/ag/application-gateway-waf-metrics#logs-and-diagnostics)
- [Diagnostic logs](https://learn.microsoft.com/azure/web-application-firewall/ag/web-application-firewall-logs#diagnostic-logs)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -121,7 +79,9 @@ WAF could block a legitimate request that it shouldn't (a false positive). You c
-### WAF-5 - Monitor Web Application Firewall
+### WAF-3 - Monitor Web Application Firewall
+
+**Category: Monitoring**
**Impact: Medium**
@@ -134,7 +94,7 @@ Monitoring the health of your WAF and the applications that it protects is impor
- [WAF monitoring](https://learn.microsoft.com/azure/web-application-firewall/ag/ag-overview#waf-monitoring)
- [Azure Monitor Workbook for WAF](https://github.com/Azure/Azure-Network-Security/tree/master/Azure%20WAF/Workbook%20-%20WAF%20Monitor%20Workbook)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/security/key-vault/_index.md b/docs/content/services/security/key-vault/_index.md
index ea4030cce..617850e57 100644
--- a/docs/content/services/security/key-vault/_index.md
+++ b/docs/content/services/security/key-vault/_index.md
@@ -12,13 +12,13 @@ The presented resiliency recommendations in this guidance include Key Vault and
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [KV-1 - Key vaults should have soft delete enabled](#kv-1---key-vaults-should-have-soft-delete-enabled) | High | Preview | Yes |
-| [KV-2 - Key vaults should have purge protection enabled](#kv-2---key-vaults-should-have-purge-protection-enabled) | High | Preview | Yes |
-| [KV-3 - Enable Azure Private Link Service for Key vault](#kv-3---enable-azure-private-link-service-for-key-vault) | High | Preview | No |
-| [KV-4 - Use separate key vaults per application per environment](#kv-4---use-separate-key-vaults-per-application-per-environment) | High | Preview | No |
-| [KV-5 - Diagnostic logs in Key Vault should be enabled](#kv-5---diagnostic-logs-in-key-vault-should-be-enabled) | Low | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [KV-1 - Key vaults should have soft delete enabled](#kv-1---key-vaults-should-have-soft-delete-enabled) | Disaster Recovery | High | Preview | Yes |
+| [KV-2 - Key vaults should have purge protection enabled](#kv-2---key-vaults-should-have-purge-protection-enabled) | Disaster Recovery | High | Preview | Yes |
+| [KV-3 - Enable Azure Private Link Service for Key vault](#kv-3---enable-azure-private-link-service-for-key-vault) | Networking | High | Preview | No |
+| [KV-4 - Use separate key vaults per application per environment](#kv-4---use-separate-key-vaults-per-application-per-environment) | Governance | High | Preview | No |
+| [KV-5 - Diagnostic logs in Key Vault should be enabled](#kv-5---diagnostic-logs-in-key-vault-should-be-enabled) | Monitoring | Low | Preview | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -43,7 +43,7 @@ Key Vault's soft-delete feature allows recovery of the deleted vaults and delete
- [Azure Key Vault soft-delete overview](https://learn.microsoft.com/azure/key-vault/general/soft-delete-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -67,7 +67,7 @@ Malicious deletion of a key vault can lead to permanent data loss. A malicious i
- [Azure Key Vault purge-protection overview](https://learn.microsoft.com/azure/key-vault/general/soft-delete-overview#purge-protection)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -91,7 +91,7 @@ Azure Private Link Service enables you to access Azure Key Vault and Azure hoste
- [Azure Key Vault Private Link Service overview](https://learn.microsoft.com/azure/key-vault/general/security-features#network-security)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -115,7 +115,7 @@ Key vaults define security boundaries for stored secrets. Grouping secrets into
- [Azure Key Vault best practices overview](https://learn.microsoft.com/azure/key-vault/general/best-practices#why-we-recommend-separate-key-vaults)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -139,7 +139,7 @@ Enable logs , set up alerts and retain them as per the retention requirement. Th
- [Azure Key Vault logging overview](https://learn.microsoft.com/azure/key-vault/general/logging?tabs=Vault)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/security/key-vault/code/kv-3/kv-3.kql b/docs/content/services/security/key-vault/code/kv-3/kv-3.kql
index 614a7f9ca..597e94efc 100644
--- a/docs/content/services/security/key-vault/code/kv-3/kv-3.kql
+++ b/docs/content/services/security/key-vault/code/kv-3/kv-3.kql
@@ -1 +1,9 @@
-// under-development
+// Azure Resource Graph Query
+// This resource graph query will return all Key Vaults that does not have a Private Endpoint Connection or where a private endpoint exists but public access is enabled
+
+resources
+| where type == "microsoft.keyvault/vaults"
+| where isnull(properties.privateEndpointConnections) or properties.privateEndpointConnections[0].properties.provisioningState != ("Succeeded") or (isnull(properties.networkAcls) and properties.publicNetworkAccess == 'Enabled')
+| extend param1 = strcat('Private Endpoint: ', iif(isnotnull(properties.privateEndpointConnections),split(properties.privateEndpointConnections[0].properties.privateEndpoint.id,'/')[8],'No Private Endpoint'))
+| extend param2 = strcat('Access: ', iif(properties.publicNetworkAccess == 'Disabled', 'Public Access Disabled', iif(isnotnull(properties.networkAcls), 'NetworkACLs in place','Public Access Enabled')))
+| project recommendationID = "kv-3", name, id, tags, param1, param2
diff --git a/docs/content/services/security/key-vault/code/kv-5/kv-5.kql b/docs/content/services/security/key-vault/code/kv-5/kv-5.kql
index 614a7f9ca..89595984e 100644
--- a/docs/content/services/security/key-vault/code/kv-5/kv-5.kql
+++ b/docs/content/services/security/key-vault/code/kv-5/kv-5.kql
@@ -1 +1,31 @@
-// under-development
+// Azure Resource Graph Query
+// This resource graph query will return all Key Vaults that does not have Diagnostic logs enabled
+
+policyresources
+| where type == 'microsoft.policyinsights/policystates'
+| where properties.complianceState == 'NonCompliant'
+| extend policyDefinitionId = tostring(tolower(properties.policyDefinitionId)),resourceId = tostring(tolower(properties.resourceId)), PolicyAssignmentName = properties.policyAssignmentName, policySetDefinitionId = tostring(tolower(properties.policySetDefinitionId))
+| project resourceId,policySetDefinitionId,policyDefinitionId
+| join kind=inner(
+ policyresources
+ | where type == 'microsoft.authorization/policydefinitions'
+ | extend displayName = tostring(properties.displayName)
+ | where displayName contains "Resource logs in Key Vault should be enabled"
+ | project policyDefinitionId=tostring(tolower(id)),displayName
+) on policyDefinitionId
+| project resourceId,policySetDefinitionId,policyDefinitionId
+| join kind=inner(
+ policyresources
+ | where type == 'microsoft.authorization/policysetdefinitions'
+ | extend displayName = tostring(properties.displayName)
+ | where displayName contains "Microsoft cloud security benchmark"
+ | project policySetDefinitionId=tostring(tolower(id)),displayName
+) on policySetDefinitionId
+| join kind=inner(
+ resources
+ | where type == 'microsoft.keyvault/vaults'
+ | project resourceId = tostring(tolower(id)),name,tags
+)on resourceId
+| project-away resourceId1,policySetDefinitionId1,policySetDefinitionId,policyDefinitionId,displayName
+| project recommendationID = "kv-5",id=resourceId,name,tags
+
diff --git a/docs/content/services/specialized-workloads/azure-hpc/_index.md b/docs/content/services/specialized-workloads/azure-hpc/_index.md
new file mode 100644
index 000000000..ef636eda4
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-hpc/_index.md
@@ -0,0 +1,143 @@
++++
+title = "Azure High Performance Computing"
+description = "Best practices and resiliency recommendations for Azure High Performance Computing and associated resources and settings."
+date = "1/12/24"
+author = "ztrocinski"
+msAuthor = "ztrocinski"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Azure High Performance Computing and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Impact | Design Area | State | ARG Query Available |
+| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :------: |
+| [HPC-1 - Ensure File shares that stores jobs metadata are accessible from all head nodes](#hpc-1---ensure-file-shares-that-stores-jobs-metadata-are-accessible-from-all-head-nodes) | High | Application Resilience | Preview | No |
+| [HPC-2 - Automatically grow and shrink HPC Pack cluster resources](#hpc-2---automatically-grow-and-shrink-hpc-pack-cluster-resources) | Medium | System Efficiency | Preview | No |
+| [HPC-3 - Use multiple head nodes for HPC Pack](#hpc-3---use-multiple-head-nodes-for-hpc-pack) | Medium | Application Resilience | Preview | No |
+| [HPC-4 - Use HPC Pack Azure AD Integration or other highly available AD configuration](#hpc-4---use-hpc-pack-azure-ad-integration-or-other-highly-available-ad-configuration) | High | Application Resilience | Preview | No |
+| [BA-1 Monitor Batch account quota](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/batch/batch-accounts/#ba-1---monitor-batch-account-quota) | Medium | Monitoring | Preview | No |
+| [BA-3 Create an Azure Batch pool across Availability Zones](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/batch/batch-accounts/#ba-3---create-an-azure-batch-pool-across-availability-zones) | High | Availability | Preview | No |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### HPC-1 - Ensure File shares that stores jobs metadata are accessible from all head nodes
+
+**Category: Application Resilience**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Currently in all HPC Pack ARM templates we create the cluster share on one of the head node which is not highly available. If that head node is down, the share will not be accessible to the HPC Service running on other head node.
+
+With Azure Files, the following file shares can be moved to Azure Files shares with SMB permissions to make them highly available:
+
+- `\\\REMINST`
+- `\\\HpcServiceRegistration`
+- `\\\Runtime$`
+- `\\\TraceRepository`
+- `\\\Diagnostics`
+- `\\\CcpSpoolDir`
+
+With above setup all nodes can access the file shares independent of the the head nodes
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/powershell/high-performance-computing/hpcpack-ha-cloud?view=hpc19-ps#hpc-pack-cluster-shares)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/hpc-1/hpc-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### HPC-2 - Automatically grow and shrink HPC Pack cluster resources
+
+**Category: System Efficiency**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+By deploying Azure "burst" nodes (both Windows and Linux) in your HPC Pack cluster or creating your HPC Pack cluster in Azure, you can automatically grow or shrink the cluster's resources such as nodes or cores according to the workload on the cluster. Scaling the cluster resources in this way allows you to execute jobs without any interruptions. In addition it helps using the resources efficiently.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/powershell/high-performance-computing/hpcpack-auto-grow-shrink?view=hpc19-ps)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/hpc-2/hpc-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### HPC-3 - Use multiple head nodes for HPC Pack
+
+**Category: Application Resilience**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Establish a cluster with a minimum of two head nodes. In the event of a head node failure, the active HPC Service will be automatically transferred from the affected head node to another functioning one.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/powershell/high-performance-computing/hpcpack-ha-cloud?view=hpc19-ps#dealing-with-head-node-failure)
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/hpc-3/hpc-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### HPC-4 - Use HPC Pack Azure AD Integration or other highly available AD configuration
+
+**Category: Application Resilience**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+When HPC failed to connect to the Domain controller, admin and user will not be able to connect to the HPC Service thus not able to manage and submit jobs to the cluster. And new jobs will not be able started on the domain joined computer nodes as the NodeManager service failed to validate the job's credential. Thus you need consider below options:
+
+- Having a high available domain controller deployed with your HPC Pack Cluster in Azure
+
+- Using Azure AD Domain service. During cluster deployment, you could just join all your cluster nodes into this domain and you get the high available domain service from Azure.
+
+- Using HPC Pack Azure AD integration solution without having the cluster nodes joining any domain. Thus as long as the HPC Service has connectivity to the Azure AD service.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/powershell/high-performance-computing/hpcpack-ha-cloud?view=hpc19-ps#dealing-with-ad-failure)
+
+
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/hpc-4/hpc-4.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
diff --git a/docs/content/services/specialized-workloads/azure-hpc/code/hpc-1/hpc-1.kql b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-1/hpc-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-1/hpc-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-hpc/code/hpc-2/hpc-2.kql b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-2/hpc-2.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-2/hpc-2.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-hpc/code/hpc-3/hpc-3.kql b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-3/hpc-3.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-3/hpc-3.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-hpc/code/hpc-4/hpc-4.kql b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-4/hpc-4.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-hpc/code/hpc-4/hpc-4.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/_index.md b/docs/content/services/specialized-workloads/azure-virtual-desktop/_index.md
index 1239a61a2..9c5755d76 100644
--- a/docs/content/services/specialized-workloads/azure-virtual-desktop/_index.md
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/_index.md
@@ -12,16 +12,55 @@ The presented resiliency recommendations in this guidance include Azure Virtual
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | Design Area | State | ARG Query Available |
-| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :------: |
-| [AVD-1 Use Private link when connecting to File Share or Key Vault](#avd-1---use-private-link-when-connecting-to-file-share-or-key-vault) | Medium | Networking and Connectivity | Preview | Yes |
-| [AVD-2 Monitor Service Health and Resource Health of AVD](#avd-2---monitor-service-health-and-resource-health-of-avd) | Medium | Resiliency/Monitoring | Preview | No |
-| [AVD-3 Deploy Session Hosts in an Availability Zone](#avd-3---deploy-session-hosts-in-an-availability-zone) | High | Application Delivery | Preview | No |
-| [AVD-4 Deploy Domain Controllers in Azure Virtual Network Across Availability Zones](#avd-4---deploy-domain-controllers-in-azure-virtual-network-across-availability-zones) | Medium | Identity | Preview | No |
-| [AVD-5 Implement RDP Shortpath for Public or Managed Networks](#avd-5---implement-rdp-shortpath-for-public-or-managed-networks) | Medium | Networking | Preview | No |
-| [AVD-6 Implement a Multi-Region BCDR Plan](#avd-6---implement-a-multi-region-bcdr-plan) | Medium | Backup | Preview | No |
-| [AVD-7 Store Golden Image Redundantly for Disaster Recovery](#avd-7---store-golden-image-redundantly-for-disaster-recovery) | Low | Backup | Preview | No |
-| [AVD-8 Capacity Planning for AVD Resources](#avd-8---capacity-planning-for-avd-resources) | Low | Compute | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:--------:|:-------:|:-------------------:|
+| [AVD-1 - Use Private link when connecting to File Share or Key Vault](#avd-1---use-private-link-when-connecting-to-file-share-or-key-vault) | Access & Security | Medium | Verified | Yes |
+| [AVD-2 - Monitor Service Health and Resource Health of AVD](#avd-2---monitor-service-health-and-resource-health-of-avd) | Monitoring | High | Verified | Yes |
+| [AVD-4 - Deploy Domain Controllers and DNS Servers in Azure Virtual Network Across Availability Zones](#avd-4---deploy-domain-controllers-and-dns-servers-in-azure-virtual-network-across-availability-zones) | Availability | High | Verified | No |
+| [AVD-5 - Implement RDP Shortpath for Public or Managed Networks](#avd-5---implement-rdp-shortpath-for-public-or-managed-networks) | Networking | Medium | Verified | No |
+| [AVD-6 - Implement a Multi-Region BCDR Plan](#avd-6---implement-a-multi-region-bcdr-plan) | Disaster Recovery | Medium | Verified | No |
+| [AVD-7 - Store Golden Image Redundantly for Disaster Recovery](#avd-7---store-golden-image-redundantly-for-disaster-recovery) | Disaster Recovery | Low | Verified | No |
+| [AVD-8 - Capacity Planning for AVD Resources](#avd-8---capacity-planning-for-avd-resources) | Disaster Recovery | Low | Verified | No |
+| [AVD-9 - Ensure that FSLogix Storage Account is Redundant](#avd-9---ensure-that-fslogix-storage-account-is-redundant) | Availability | High | Verified | Yes |
+| [AVD-10 - Enable Azure Backup for FSLogix Storage Account](#avd-10---enable-azure-backup-for-fslogix-storage-account) | Storage | Medium | Verified | No |
+| [AVD-11 - Scaling plans should be created per region and not scaled across regions](#avd-11---scaling-plans-should-be-created-per-region-and-not-scaled-across-regions) | Disaster Recovery | Medium | Verified | No |
+| [AVD-13 - Validate that the AVD session hosts can communicate with the AVD control plane and UDP ports are open if UDP is in use](#avd-13---validate-avd-session-host-connectivity-to-the-avd-control-plane-and-udp-ports-open-if-in-use) | Networking | Medium | Verified | No |
+| [AVD-14 - Ensure Secondary Entra ID connect synchronization server](#avd-14---ensure-secondary-entra-id-connect-synchronization-server) | Access & Security | Low | Verified | No |
+| [AVD-15 - Deploy paired Domain Controllers in the same region as AVD session hosts](#avd-15---deploy-paired-domain-controllers-in-the-same-region-as-avd-session-hosts) | Disaster Recovery | High | Verified | No |
+| [AVD-16 - Ensure DNS regions are replicated to avoid single point of failure](#avd-16---ensure-dns-regions-are-replicated-to-avoid-single-point-of-failure) | Networking | Medium | Verified | No |
+| [AVD-17 - Capacity Planning for AVD Resources](#avd-17---capacity-planning-for-avd-resources) | Disaster Recovery | Low | Verified | No |
+| [AVD-18 - Create new version of updated image and replace session hosts rather than update host directly](#avd-18---create-updated-image-version-and-replace-session-hosts-rather-than-updating-host-directly) | Governance | Low | Verified | No |
+| [AVD-19 - Pooled Create a validation pool for testing of planned updates](#avd-19---pooled-create-a-validation-pool-for-testing-of-planned-updates) | Governance | Medium | Verified | No |
+| [AVD-20 - Pooled Configure scheduled agent updates](#avd-20---pooled-configure-scheduled-agent-updates) | System Efficiency | Medium | Verified | No |
+| [AVD-21 - Personal Create a validation pool for testing of planned updates](#avd-21---personal-create-a-validation-pool-for-testing-of-planned-updates) | Governance | Low | Verified | No |
+| [AVD-22 - Use Azure Site Recovery or Backups on VMs supporting personal desktops](#avd-22---use-azure-site-recovery-or-backups-on-vms-supporting-personal-desktops) | Disaster Recovery | Medium | Verified | No |
+| [AVD-23 - Ensure a unique OU when deploying VMs to Domain](#avd-23---ensure-a-unique-ou-when-deploying-vms-to-domain) | Governance | Medium | Verified | No |
+| [AVD-24 - Ensure the standard FSLogix configuration is deployed](#avd-24---ensure-the-standard-fslogix-configuration-is-deployed) | Storage | Medium | Verified | No |
+| [AVD-25 - Ensure user permissions are set correctly on SMB shares](#avd-25---ensure-user-permissions-are-set-correctly-on-smb-shares) | Storage | Medium | Verified | No |
+| [AVD-26 - Configure Diagnostic Settings for FSLogix logs and enable review for accounts](#avd-26---configure-diagnostic-settings-for-fslogix-logs-and-enable-review-for-accounts) | Storage | Medium | Verified | No |
+| [AVD-27 - Manually update new FSLogix image when available](#avd-27---manually-update-new-fslogix-image-when-available) | Availability | Low | Verified | No |
+| [AVD-28 - Turn on Continuous Availability for ANF if using App Attach](#avd-28---turn-on-continuous-availability-for-anf-if-using-app-attach) | App Attach Storage | Medium | Verified | No |
+| [AVD-29 - App attach should be placed in separate file share; Disaster recovery plan should include App attach storage](#avd-29---app-attach-should-be-placed-in-separate-file-share-and-disaster-recovery-plan-should-include-app-attach-storage) | Storage | Medium | Verified | No |
+| [AVD-30 - Ensure virtual networks have route tables/route server configured for all regions](#avd-30---ensure-virtual-networks-have-route-tablesroute-server-configured-for-all-regions) | Networking | Medium | Verified | No |
+| [AVD-31 - Ensure virtual networks isolation with separate IP space and NSGs for Prod and DR](#avd-31---ensure-virtual-networks-isolation-with-separate-ip-space-and-nsgs-for-prod-and-dr) | Networking | Medium | Verified | No |
+| [AVD-33 - Ensure route tables accommodate failover](#avd-33---ensure-route-tables-accommodate-failover) | Disaster Recovery | Medium | Verified | No |
+| [AVD-34 - Ensure Resilient Deployment of Keyvault for AVD Host Pools](#avd-34---provision-secondary-key-vault-for-disaster-recovery) | Disaster Recovery | High | Verified | No |
+| [AVD-35 - Configure AVD insights Workbook](#avd-35---configure-avd-insights-workbook) | Monitoring | High | Verified | No |
+| [AVD-36 - Ensure separate log analytics workspaces for Prod and DR](#avd-36---ensure-separate-log-analytics-workspaces-for-prod-and-dr) | Disaster Recovery | Low | Verified | No |
+| [AVD-37 - Organize AVD resources using the AVD Scale unit model described by the AVD Landing Zone Methodology](#avd-37---organize-avd-resources-using-the-avd-scale-unit-model-described-by-the-avd-landing-zone-methodology) | Governance | Low | Verified | No |
+| [IT-2 - Replicate your Image Templates to a secondary region](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/compute/image-templates/#it-2---replicate-your-image-templates-to-a-secondary-region) | Disaster Recovery | Low | Preview | Yes |
+| [CG-2 - Zone redundant storage should be used for image versions](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/compute/compute-gallery/#cg-2---zone-redundant-storage-should-be-used-for-image-versions) | Availability | Medium | Verified | Yes |
+| [VM-2 - Deploy VMs across Availability Zones](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/compute/virtual-machines/#vm-2---deploy-vms-across-availability-zones) | Availability | High | Verified | Yes |
+| [VM-7 - Enable Backups on your VMs](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/compute/virtual-machines/#vm-7---backup-vms-with-azure-backup-service) | Disaster Recovery | Medium | Verified | Yes |
+| [VM-8 - Production VMs should be using SSD disks](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/compute/virtual-machines/#vm-8---production-vms-should-be-using-ssd-disks) | System Efficiency | High | Verified | Yes |
+| [VM-21 - Configure diagnostic settings for all Azure Virtual Machines](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/compute/virtual-machines/#vm-21---configure-diagnostic-settings-for-all-azure-virtual-machines) | Monitoring | Low | Preview | Yes |
+| [ERC-1 - Connect your on-premises network to critical workloads in Azure through two or more ExpressRoute circuits in different peering locations](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-circuits/#erc-1---connect-your-on-premises-network-to-critical-workloads-in-azure-through-two-or-more-expressroute-circuits-in-different-peering-locations) | Availability | High | Verified | No |
+| [ERC-2 - Ensure the two physical links of your ExpressRoute circuit are connected to two distinct edge devices in your network](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-circuits/#erc-2---ensure-the-two-physical-links-of-your-expressroute-circuit-are-connected-to-two-distinct-edge-devices-in-your-network) | Availability | High | Verified | No |
+| [VPNG-1 - Choose a Zone-redundant gateway](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/vpn-gateway/#vpng-1---choose-a-zone-redundant-gateway) | Availability | High | Verified | Yes |
+| [NSG-4 - Configure NSG Flow Logs](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/network-security-group/#nsg-4---configure-nsg-flow-logs) | Monitoring | Medium | Preview | Yes |
+| [ST-1 - Ensure that Storage Account configuration is at least Zone redundant](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/storage/storage-account/#st-1---ensure-that-storage-account-configuration-is-at-least-zone-redundant) | Storage | High | Verified | Yes |
+| [WADS-3 - Ensure that all fault-points and fault-modes are understood and operationalized](https://azure.github.io/Azure-Proactive-Resiliency-Library/well-architected/2-design/#wads-3---ensure-that-all-fault-points-and-fault-modes-are-understood-and-operationalized) | Availability | High | Verified | No |
+| [WADS-7 - Design a BCDR strategy that will help to meet the business requirements](https://azure.github.io/Azure-Proactive-Resiliency-Library/well-architected/2-design/#wads-7---design-a-bcdr-strategy-that-will-help-to-meet-the-business-requirements) | Disaster Recovery | High | Verified | No |
{{< /table >}}
@@ -35,11 +74,11 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### AVD-1 - Use Private link when connecting to File Share or Key Vault
-**Category: Access & Security/Networking and Connectivity**
+**Category: Access & Security**
**Impact: Medium**
-**Recommendation/Guidance**
+**Guidance**
Private Link is available for other Azure services that work in conjunction with Azure Virtual Desktop, such as Azure Files and Key Vault. From a resiliency standpoint, we recommending implementing private endpoints for these services to reduce exposure to potential internet-related issues such as latency, packet loss, and/or downtime. This can lead to more reliable communication between AVD and dependent services.
@@ -48,7 +87,7 @@ Private Link is available for other Azure services that work in conjunction with
- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/networking#private-endpoints-private-link)
- [Private link](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/networking#private-endpoints-private-link)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -60,23 +99,21 @@ Private Link is available for other Azure services that work in conjunction with
### AVD-2 - Monitor Service Health and Resource Health of AVD
-**Category: Resiliency/Monitoring**
+**Category: Monitoring**
-**Impact: Medium**
+**Impact: High**
-**Recommendation/Guidance**
+**Guidance**
Use Service Health to stay informed about the health of the Azure services and regions that you use to insure their availability.
Set up Service Health alerts so that you stay aware of service issues, planned maintenance, or other changes that might affect your Azure Virtual Desktop resources.
Use Resource Health to monitor your VMs and storage solutions.
-
**Resources**
- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/monitoring#resource-health)
-
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -86,42 +123,15 @@ Use Resource Health to monitor your VMs and storage solutions.
-### AVD-3 - Deploy Session Hosts in an Availability Zone
+### AVD-4 - Deploy Domain Controllers and DNS Servers in Azure Virtual Network Across Availability Zones
-**Category: Application Resilience/Availability**
+**Category: Availability**
**Impact: High**
-**Recommendation/Guidance**
-
-Deploy session hosts in an availability zone or an availability set helps protect the environment from outages.
-
-Enhances reliability by minimizing latency and impacts reliability helping keep the data synchronized and protecting from outages. If one zone experiences an outage, then regional services, capacity, and high availability are supported by the remaining zones.
-
-**Resources**
-
-- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/application-delivery#session-host-settings)
-- [Availability Zones](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/application-delivery#session-host-settings)
-
-**Resource Graph Query/Scripts**
-
-{{< collapse title="Show/Hide Query/Script" >}}
-
-{{< code lang="sql" file="code/avd-3/avd-3.kql" >}} {{< /code >}}
-
-{{< /collapse >}}
-
-
-
-### AVD-4 - Deploy Domain Controllers in Azure Virtual Network Across Availability Zones
+**Guidance**
-**Category: Availability/Identity**
-
-**Impact: Medium**
-
-**Recommendation/Guidance**
-
-When using an AD DS identity solution with AVD, it is recommended to deploy domain controllers on azure virtual machines across availability zones. This improves the reliability of the environment by being independent of an on premises connection as well as creates a shorter path for user’s authentication improving performance.
+When using an AD DS identity solution with AVD, it is recommended to deploy domain controllers and DNS servers on Azure virtual machines across availability zones. This improves the environment’s reliability by removing a dependency on an on-premises service and improves performance by creating a shorter path for user authentication.
This recommendation is not relevant when you are utilizing Microsoft Entra as the identity provider.
@@ -129,7 +139,7 @@ This recommendation is not relevant when you are utilizing Microsoft Entra as th
- [Learn More](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/identity/adds-extend-domain#reliability)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -145,7 +155,7 @@ This recommendation is not relevant when you are utilizing Microsoft Entra as th
**Impact: Medium**
-**Recommendation/Guidance**
+**Guidance**
It is recommended to enable RDP Shortpath for AVD. RDP Shortpath is a feature of Azure Virtual Desktop that establishes a direct UDP-based transport between a supported Windows Remote Desktop client and session host. By default, Remote Desktop Protocol (RDP) tries to establish connection using UDP and uses a TCP-based reverse connect transport as a fallback connection mechanism. TCP-based reverse connect transport provides the best compatibility with various networking configurations and has a high success rate for establishing RDP connections. UDP-based transport offers better connection reliability and more consistent latency.
@@ -153,7 +163,7 @@ It is recommended to enable RDP Shortpath for AVD. RDP Shortpath is a feature of
- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/rdp-shortpath?tabs=managed-networks)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -165,11 +175,11 @@ It is recommended to enable RDP Shortpath for AVD. RDP Shortpath is a feature of
### AVD-6 - Implement a Multi-Region BCDR Plan
-**Category: Backup**
+**Category: Disaster Recovery**
**Impact: Medium**
-**Recommendation/Guidance**
+**Guidance**
It is recommended to adopt a multi-region deployment (active-active) for AVD. Each region should contain at least identity, name resolution, AVD management resources, and session hosts in case of a primary region outage.
@@ -178,8 +188,7 @@ It is recommended to adopt a multi-region deployment (active-active) for AVD. Ea
- [Multi-region BCDR](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/wvd/azure-virtual-desktop-multi-region-bcdr)
- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/business-continuity#active-active-scenarios)
-
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -189,15 +198,13 @@ It is recommended to adopt a multi-region deployment (active-active) for AVD. Ea
-
-
### AVD-7 - Store Golden Image Redundantly for Disaster Recovery
-**Category: Backup**
+**Category: Disaster Recovery**
**Impact: Low**
-**Recommendation/Guidance**
+**Guidance**
If a full BCDR strategy is not in place, consider using zone-redundant storage to store golden images across availability zones. Having the image available will allow for faster recovery in case of zonal or regional outage.
@@ -206,8 +213,7 @@ If a full BCDR strategy is not in place, consider using zone-redundant storage t
- [Golden Image](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/business-continuity#golden-images)
- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/application-delivery#fault-tolerance)
-
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -219,28 +225,684 @@ If a full BCDR strategy is not in place, consider using zone-redundant storage t
### AVD-8 - Capacity Planning for AVD Resources
-**Category: Backup**
+**Category: Disaster Recovery**
**Impact: Low**
-**Recommendation/Guidance**
+**Guidance**
-Monitor and plan for subscription limits. Closely monitor your Azure Virtual Desktop deployments, and keep track of resource usage within your subscription. By proactively monitoring capacity, you can identify potential challenges early on, and you can take suitable actions to avoid reaching limits.
+Monitor and plan for subscription limits and API throttling limits. Closely monitor your Azure Virtual Desktop deployments, and keep track of resource usage within your subscription. By proactively monitoring capacity, you can identify potential challenges early on, and you can take suitable actions to avoid reaching limits.
Consider scaling across multiple subscriptions if further scaling is required, or work with Azure support to adjust limits based on your business requirements.
To handle a large number of users, consider scaling horizontally by creating multiple host pools.
-
**Resources**
- [Capacity Planning](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/business-continuity#capacity-planning)
- [Learn More](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/wvd/windows-virtual-desktop#azure-virtual-desktop-limitations)
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-8/avd-8.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-9 - Ensure that FSLogix Storage Account is Redundant
+
+**Category: Availability**
+
+**Impact: Medium**
+
+**Guidance**
+
+It is important to ensure the redundancy of our user profiles when using FSLogix. When using FSLogix with AVD, it is deployed on a file share in a storage account. Data in an Azure Storage account is always replicated three times in the primary region. Below are the options for how your data is replicated in the primary or paired region:
+LRS for least expensive replication (not recommended for apps with high availability and durability).
+
+- LRS provides eleven 9s durability and replicates three time in a single physical location.
+- ZRS is recommended for apps requiring high availability across zones. ZRS provides twelve 9s durability. Replicated across three availability zones
+- GRS replicates an additional three copies to secondary region and provides sixteen 9s durability.
+- GZRS provides both high availability and redundancy across geo replication. It provides sixteen 9s durability over a given year.
+
+Generally, it is recommended to store your data as secure and redundant as possible.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-virtual-desktop/storage#user-profiles)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-9/avd-9.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-10 - Enable Azure Backup for FSLogix Storage Account
+
+**Category: Storage**
+
+**Impact: Medium**
+
+**Guidance**
+
+It is recommended to enable backup on the FSLogix Storage Account. Ensuring the user profiles are resilient will allow user data and experience to be consistent through outages.
+
+**Resources**
+
+- [FSLogix](https://learn.microsoft.com/en-us/fslogix/overview-what-is-fslogix)
+- [Backup Storage Account](https://learn.microsoft.com/en-us/azure/backup/blob-backup-configure-manage?tabs=operational-backup)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-10/avd-10.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-11 - Scaling plans should be created per region and not scaled across regions
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance:**
+Each region has its own scaling plans assigned to host pools within that region. However, these plans can become inaccessible if there's a regional failure. To mitigate this risk, it's advisable to create a secondary scaling plan in another region.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/autoscale-scaling-plan?tabs=portal)
**Resource Graph Query/Scripts**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/avd-8/avd-8.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/avd-11/avd-11.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-13 - Validate AVD Session Host Connectivity to the AVD Control Plane and UDP Ports open if in use
+
+**Category: Networking**
+
+**Impact: Medium**
+
+**Guidance:**
+Ensure that AVD session hosts can effectively communicate with the AVD control plane and that UDP ports are open if UDP is utilized. Validate the connectivity of VMs to the AVD Control Plane and confirm the accessibility of UDP TURN ports. Whitelist global URLs and ensure that UDP/TURN ports are open and accessible to facilitate smooth user connections. Proper connectivity validation guarantees optimal performance and user experience within the AVD environment.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/troubleshoot-rdp-shortpath)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-13/avd-13.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-14 - Ensure Secondary Entra ID connect synchronization server
+
+**Category: Access & Security**
+
+**Impact: Low**
+
+**Guidance:**
+Hybrid - Entra ID Connect best to run in Azure but can be hosted on-prem. Secondary or more VMs should be setup in staging mode in event of failover.
+Set up secondary server in staging mode for Entra Connect for syncing to Entra in case of primary server outage.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-install-multiple-domains)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-14/avd-14.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-15 - Deploy paired Domain Controllers in the same region as AVD session hosts
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Guidance:**
+Ensure each region with session hosts has multiple domain controllers in the same region to support high availability with regards to identity.
+For a hybrid scenario, each Azure region with AVD session hosts should have Active Directory Domain Controllers in Azure and use Availability Zones or Availability Sets for resilience within the region. This also mitigates dependency on ER/VPN/Inter-Azure dependencies.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/azure-virtual-desktop/azure-virtual-desktop-multi-region-bcdr)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-15/avd-15.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-16 - Ensure DNS regions are replicated to avoid single point of failure
+
+**Category: Networking**
+
+**Impact: Medium**
+
+**Guidance:**
+Active Directory Domain Services (AD DS) integrated DNS/other should target Secondary/Tertiary customer DNS across multi-region zones. If using custom DNS, ensure there are redundant DNS servers to avoid a single point of failure.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/azure-virtual-desktop/azure-virtual-desktop-multi-region-bcdr)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-16/avd-16.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-17 - Capacity Planning for AVD Resources
+
+**Category: Disaster Recovery**
+
+**Impact: Low**
+
+**Guidance:**
+Monitor and plan for subscription limits and API throttling limits. Closely monitor your Azure Virtual Desktop deployments and keep track of resource usage within your subscription. By proactively monitoring capacity, you can identify potential challenges early on, and you can take suitable actions to avoid reaching limits. Consider scaling across multiple subscriptions if further scaling is required, or work with Azure support to adjust limits based on your business requirements. To handle a large number of users, consider scaling horizontally by creating multiple host pools.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/wvd/windows-virtual-desktop#azure-virtual-desktop-limitations)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-17/avd-17.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-18 - Create updated image version and replace session hosts rather than updating host directly
+
+**Category: Governance**
+
+**Impact: Low**
+
+**Guidance:**
+Establish a systematic process for handling image updates within your Azure Virtual Desktop environment. Instead of directly updating individual session hosts, create a new version of the updated image. This process involves creating and configuring a golden image with the necessary updates and configurations. Once the new image is prepared, replace existing session hosts with instances using the updated image. This approach ensures consistency across all session hosts and minimizes the risk of configuration drift. Additionally, it enables quick rollback to a previous image version in case of any issues with the update. Implementing this process helps streamline maintenance activities and ensures that all session hosts are up-to-date with the latest configurations and updates.
+has context menu
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/training/modules/create-manage-session-host-image/)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-18/avd-18.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-19 - [Pooled] Create a validation pool for testing of planned updates
+
+**Category: Governance**
+
+**Impact: Medium**
+
+**Guidance:**
+At least one Validation Pool to have early warning if a planned update to AVD causes an issue. support to adjust limits based on your business requirements. To handle a large number of users, consider scaling horizontally by creating multiple host pools.
+Also check that the host pool has been used regularly to test planned updates.
+Host pools are a collection of one or more identical virtual machines within Azure Virtual Desktop environment. We highly recommend you create a validation host pool where service updates are applied first. Validation host pools let you monitor service updates before the service applies them to your standard or non-validation environment. Without a validation host pool, you may not discover changes that introduce errors, which could result in downtime for users in your standard environment.
+To ensure your apps work with the latest updates, the validation host pool should be as similar to host pools in your non-validation environment as possible. Users should connect as frequently to the validation host pool as they do to the standard host pool. If you have automated testing on your host pool, you should include automated testing on the validation host pool.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/configure-validation-environment?tabs=azure-portal)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-19/avd-19.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-20 - [Pooled] Configure scheduled agent updates
+
+**Category: System Efficiency**
+
+**Impact: Medium**
+
+**Guidance:**
+Ensure schedules have been created to provide maintenance windows for AVD agent updates.
+The Scheduled Agent Updates feature lets you create up to two maintenance windows for the Azure Virtual Desktop agent, side-by-side stack, and Geneva Monitoring agent to get updated so that updates don't happen during peak business hours.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/scheduled-agent-updates)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-20/avd-20.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-21 - [Personal] Create a validation pool for testing of planned updates
+
+**Category: Governance**
+
+**Impact: Low**
+
+**Guidance:**
+At least one Validation Pool to have early warning if a planned update to AVD causes an issue. Also check that the host pool has been used regularly to test planned updates.
+Host pools are a collection of one or more identical virtual machines within Azure Virtual Desktop environment. We highly recommend you create a validation host pool where service updates are applied first. Validation host pools let you monitor service updates before the service applies them to your standard or non-validation environment. Without a validation host pool, you may not discover changes that introduce errors, which could result in downtime for users in your standard environment.
+To ensure your apps work with the latest updates, the validation host pool should be as similar to host pools in your non-validation environment as possible. Users should connect as frequently to the validation host pool as they do to the standard host pool. If you have automated testing on your host pool, you should include automated testing on the validation host pool.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/configure-validation-environment?tabs=azure-portal)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-21/avd-21.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-22 - Use Azure Site Recovery or Backups on VMs supporting personal desktops
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance:**
+Leverage Azure Site Recovery (ASR) or implement Azure Backup for personal host pools for seamless failover and failback capabilities, enabling the replication of VMs supporting personal desktops to a secondary Azure region. In the event of a disaster or unexpected outage, this ensures the recovery of these VMs from a known-state.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/scheduled-agent-updates)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-22/avd-22.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-23 - Ensure a unique OU when deploying VMs to Domain
+
+**Category: Governance**
+
+**Impact: Medium**
+
+**Guidance:**
+Hybrid VMs should be in a unique OU.
+When using AD-joined session hosts will benefit from using a unique OU to target specific AVD configurations per hostpool. Examples include Fslogix, time out limits, session controls, and much more. It’s also important to segment Prod and DR organization units to ensure resources are configured per environment.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/deploy/virtual-dc/adds-on-azure-vm#configure-the-vms-and-install-active-directory-domain-services)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-23/avd-23.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-24 - Ensure the standard FSLogix configuration is deployed
+
+**Category: Storage**
+
+**Impact: High**
+
+**Guidance:**
+Ensure all session hosts have the standard FSLogix configuration deployed. Regularly validate settings for consistency and alignment with best practices.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/fslogix/reference-configuration-settings?tabs=profiles)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-24/avd-24.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-25 - Ensure user permissions are set correctly on SMB shares
+
+**Category: Storage**
+
+**Impact: High**
+
+**Guidance:**
+Verify user permissions are correctly set on SMB shares so that users have appropriate access to only their own profile and not other user profiles, while administrators have full access at the root volume. Also ensure secondary storage path permissions are set in case of a DR event.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/fslogix/how-to-configure-storage-permissions)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-25/avd-25.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-26 - Configure Diagnostic Settings for FSLogix logs and enable review for accounts
+
+**Category: Storage**
+
+**Impact: Medium**
+
+**Guidance:**
+Regularly review FSLogix logs for errors and issues related to login and mounting the profile. Events can be reviewed by looking locally inside the Session Host and also in Log Analytics when the Azure Monitor Agent is used.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/fslogix/troubleshooting-events-logs-diagnostics)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-26/avd-26.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-27 - Manually update new FSLogix image when available
+
+**Category: Governance**
+
+**Impact: Low**
+
+**Guidance:**
+Ensure a process is in place to regularly check for FSLogix agent upgrades and maintain FSLogix up to date. We recommend customers upgrade to the latest version of FSLogix as quickly as their deployment process can allow. FSLogix will provide hotfix releases which address current and potential bugs that impact customer deployments. Additionally, it is the first requirement when opening any support case.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/fslogix/how-to-install-fslogix)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-27/avd-27.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-28 - Turn on Continuous Availability for ANF if using App Attach
+
+**Category: Availability**
+
+**Impact: Medium**
+
+**Guidance**
+
+Turn on Continuous Availability if using Azure Netapp Files.
+
+Verify the number of users connecting to each file share to make sure the SMB path can handle the number of file connections. Currently, Azure Files supports up to 10k handles per root directory.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/app-attach-overview?pivots=msix-app-attach)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-28/avd-28.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-29 - App attach should be placed in separate file share and Disaster recovery plan should include App attach storage
+
+**Category: Storage**
+
+**Impact: Medium**
+
+**Guidance**
+
+App Attach packages should be on a separate share from profiles. And App Attach files should be backed up.
+
+Best practice is to separate App Attach VHD files in a separate file share away from user profiles, both for performance and scalability purposes. Requirements can vary greatly depending on how many packaged applications are stored in an image, and you need to test your applications to understand your requirements.
+
+Your file share should be in the same Azure region as your session hosts.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/app-attach-overview?pivots=msix-app-attach)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-29/avd-29.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-30 - Ensure virtual networks have route tables/route server configured for all regions
+
+**Category: Networking**
+
+**Impact: Medium**
+
+**Guidance**
+
+For high availability connections back to on-premises datacenters should consider backup paths across the regions that have been utilized. Ensure redundancy in routing by having a secondary route table in the secondary region.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/expressroute/designing-for-disaster-recovery-with-expressroute-privatepeering#need-for-redundant-connectivity-solution)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-30/avd-30.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-31 - Ensure virtual networks isolation with separate IP space and NSGs for Prod and DR
+
+**Category: Networking**
+
+**Impact: Medium**
+
+**Guidance**
+
+NSG and ASG per AVD persona and IP space per Prod/DR regions.
+
+It's important your organization plans for IP addressing in Azure. Planning ensures the IP address space doesn't overlap across on-premises locations and Azure regions. Overlapping IP address spaces across on-premises and Azure regions create major contention challenges.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/plan-for-ip-addressing)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-31/avd-31.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-33 - Ensure route tables accommodate failover
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance**
+
+Ensure Route Tables that force tunnel traffic to FW/NVA have failover considerations evaluated and won't fail or trigger next-gen FW protections.
+
+AVD workload teams should collaborate with centralized teams that manage the shared infrastructure, like networking, to ensure that both Production and DR workloads have the appropriate route tables in place for failover of routing to perform as expected.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/management-business-continuity-disaster-recovery)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-33/avd-33.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-34 - Provision Secondary Key Vault for Disaster Recovery
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Guidance:**
+To ensure continuous availability and disaster recovery readiness, it is recommended to provision a secondary Key Vault in a secondary region. In the event of a primary region failure, this secondary Key Vault will ensure that critical secrets are accessible for use in deployments in the secondary region.
+
+**Resources:**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-guidance)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-34/avd-34.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+### AVD-35 - Configure AVD Insights Workbook
+
+**Category: Monitoring**
+
+**Impact: High**
+
+**Guidance**
+
+AVD Insights is an Azure Workbook template provided by the AVD product team. It is highly recommended in order to monitor and troubleshoot AVD workloads across metrics, logs, events, and more. Both Production and DR workloads should be enabled with AVD Insights.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/insights?tabs=monitor)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-35/avd-35.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-36 - Ensure separate log analytics workspaces for Prod and DR
+
+**Category: Disaster Recovery**
+
+**Impact: Low**
+
+**Guidance**
+
+Having separate Log Analytics ensures that your DR environment is fully operational for visibility of the metrics, performance, and other auditing tools your workload teams will rely on in the event of an incident.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/virtual-desktop/diagnostics-log-analytics)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-36/avd-36.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVD-37 - Organize AVD resources using the AVD Scale unit model described by the AVD Landing Zone Methodology
+
+**Category: Governance**
+
+**Impact: Low**
+
+**Guidance**
+
+Follow AVD Landing Zone best practices using multiple resource groups based on resource type and associated shared resources for AVD workloads.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/azure-virtual-desktop/enterprise-scale-landing-zone)
+
+**Resource Graph Query/Scripts:**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avd-37/avd-37.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-10/avd-10.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-10/avd-10.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-10/avd-10.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-11/avd-11.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-11/avd-11.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-11/avd-11.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-12/avd-12.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-12/avd-12.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-12/avd-12.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-13/avd-13.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-13/avd-13.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-13/avd-13.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-14/avd-14.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-14/avd-14.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-14/avd-14.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/container/aks/code/aks-26/aks-26.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-15/avd-15.kql
similarity index 100%
rename from docs/content/services/container/aks/code/aks-26/aks-26.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-15/avd-15.kql
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-16/avd-16.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-16/avd-16.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-16/avd-16.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-17/avd-17.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-17/avd-17.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-17/avd-17.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-18/avd-18.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-18/avd-18.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-18/avd-18.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-19/avd-19.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-19/avd-19.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-19/avd-19.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-2/avd-2.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-2/avd-2.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-2/avd-2.kql
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-2/avd-2.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-20/avd-20.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-20/avd-20.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-20/avd-20.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-21/avd-21.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-21/avd-21.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-21/avd-21.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-22/avd-22.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-22/avd-22.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-22/avd-22.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-23/avd-23.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-23/avd-23.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-23/avd-23.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/container/container-registry/code/cr-4/cr-4.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-24/avd-24.kql
similarity index 100%
rename from docs/content/services/container/container-registry/code/cr-4/cr-4.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-24/avd-24.kql
diff --git a/docs/content/services/monitoring/log-analytics/code/log-2/log-2.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-25/avd-25.kql
similarity index 100%
rename from docs/content/services/monitoring/log-analytics/code/log-2/log-2.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-25/avd-25.kql
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-26/avd-26.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-26/avd-26.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-26/avd-26.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-27/avd-27.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-27/avd-27.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-27/avd-27.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-28/avd-28.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-28/avd-28.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-28/avd-28.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-29/avd-29.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-29/avd-29.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-29/avd-29.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-30/avd-30.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-30/avd-30.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-30/avd-30.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-31/avd-31.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-31/avd-31.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-31/avd-31.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-32/avd-32.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-32/avd-32.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-32/avd-32.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-33/avd-33.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-33/avd-33.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-33/avd-33.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/networking/general-networking/code/gnw-1/gnw-1.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-34/avd-34.kql
similarity index 100%
rename from docs/content/services/networking/general-networking/code/gnw-1/gnw-1.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-34/avd-34.kql
diff --git a/docs/content/services/networking/general-networking/code/gnw-2/gnw-2.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-35/avd-35.kql
similarity index 100%
rename from docs/content/services/networking/general-networking/code/gnw-2/gnw-2.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-35/avd-35.kql
diff --git a/docs/content/services/networking/general-networking/code/gnw-3/gnw-3.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-36/avd-36.kql
similarity index 100%
rename from docs/content/services/networking/general-networking/code/gnw-3/gnw-3.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-36/avd-36.kql
diff --git a/docs/content/services/networking/general-networking/code/gnw-4/gnw-4.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-37/avd-37.kql
similarity index 100%
rename from docs/content/services/networking/general-networking/code/gnw-4/gnw-4.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-37/avd-37.kql
diff --git a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-4/avd-4.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-4/avd-4.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-4/avd-4.kql
+++ b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-4/avd-4.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/networking/general-networking/code/gnw-5/gnw-5.kql b/docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-9/avd-9.kql
similarity index 100%
rename from docs/content/services/networking/general-networking/code/gnw-5/gnw-5.kql
rename to docs/content/services/specialized-workloads/azure-virtual-desktop/code/avd-9/avd-9.kql
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/_index.md b/docs/content/services/specialized-workloads/azure-vmware-solution/_index.md
new file mode 100644
index 000000000..23b2afcb1
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/_index.md
@@ -0,0 +1,519 @@
++++
+title = "Azure VMware Solution"
+description = "Best practices and resiliency recommendations for Azure VMware Solution and associated resources and settings."
+date = "3/24/2024"
+author = "michielvanschaik"
+msAuthor = "mivansch"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Azure VMware Solution and associated resources and settings.
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :------: |
+|[AVS-1 - Configure Azure Service Health notifications and alerts for Azure VMware Solution](#avs-1---configure-azure-service-health-notifications-and-alerts-for-azure-vmware-solution) | Monitoring | High | Verified | Yes |
+|[AVS-2 - Configure Syslog in Diagnostic Settings for Azure VMware Solution](#avs-2---configure-syslog-in-diagnostic-settings-for-azure-vmware-solution) | Monitoring | Medium | Verified | No |
+|[AVS-3 - Configure Azure Monitor Alert warning thresholds for vSAN datastore utilization](#avs-3---configure-azure-monitor-alert-warning-thresholds-for-vsan-datastore-utilization) | Monitoring | High | Verified | No |
+|[AVS-4 - Enable Stretched Clusters for Multi-AZ Availability of the vSAN Datastore](#avs-4---enable-stretched-clusters-for-multi-az-availability-of-the-vsan-datastore) | Availability | Low | Verified | Yes |
+|[AVS-5 - Monitor CPU Utilization to ensure sufficient resources for workloads](#avs-5---monitor-cpu-utilization-to-ensure-sufficient-resources-for-workloads) | Monitoring | Medium | Verified | Yes |
+|[AVS-6 - Monitor Memory Utilization to ensure sufficient resources for workloads](#avs-6---monitor-memory-utilization-to-ensure-sufficient-resources-for-workloads) | Monitoring | Medium | Verified | Yes |
+|[AVS-7 - Monitor when Azure VMware Solution Cluster Size is approaching the host limit](#avs-7---monitor-when-azure-vmware-solution-cluster-size-is-approaching-the-host-limit) | Monitoring | Medium | Verified | No |
+|[AVS-8 - Monitor when Azure VMware Solution Private Cloud is reaching the capacity limit](#avs-8---monitor-when-azure-vmware-solution-private-cloud-is-reaching-the-capacity-limit) | Monitoring | Medium | Verified | No |
+|[AVS-9 - Apply Resource delete lock on the resource group hosting the private cloud](#avs-9---apply-resource-delete-lock-on-the-resource-group-hosting-the-private-cloud) | Governance | High | Verified | No |
+|[AVS-10 - Align ExpressRoute configuration with best practices for circuit resilience](#avs-10---align-expressroute-configuration-with-best-practices-for-circuit-resilience) | Networking | High | Preview | No |
+|[AVS-11 - Deploy two or more circuits in different peering locations when using stretched clusters](#avs-11---deploy-two-or-more-circuits-in-different-peering-locations-when-using-stretched-clusters) | Networking | High | Verified | No |
+|[AVS-12 - Deploy two Azure VMware Solution private clouds in different regions for geographical disaster recovery](#avs-12---deploy-two-azure-vmware-solution-private-clouds-in-different-regions-for-geographical-disaster-recovery) | Disaster Recovery | High | Verified | No |
+|[AVS-13 - Use the AVS Interconnect feature to connect private clouds in different availability zones](#avs-13---use-the-avs-interconnect-feature-to-connect-private-clouds-in-different-availability-zones) | Storage | High | Verified | No |
+|[AVS-14 - Use key autorotation for vSAN datastore customer-managed keys](#avs-14---use-key-autorotation-for-vsan-datastore-customer-managed-keys) | Storage | High | Preview | No |
+|[AVS-15 - Configure LDAPS Identity integration with two sources for NSX and vCenter Server management consoles](#avs-15---configure-ldaps-identity-integration-with-two-sources-for-nsx-and-vcenter-server-management-consoles) | Storage | High | Verified | No |
+|[AVS-16 - Use HCX Network Extension High Availability](#avs-16---use-hcx-network-extension-high-availability) | Availability | High | Verified | No |
+|[AVS-17 - Verify Management Networks are not extended with HCX Network Extension](#avs-17---verify-management-networks-are-not-extended-with-hcx-network-extension) | Networking | High | Verified | No |
+|[AVS-18 - Use multiple DNS servers per private FQDN zone](#avs-18---use-multiple-dns-servers-per-private-fqdn-zone) | Networking | High | Preview | No |
+|[AVS-19 - Verify vSAN FTT configuration aligns with the cluster size](#avs-19---verify-vsan-ftt-configuration-aligns-with-the-cluster-size) | Application Resilience | Medium | Verified | No |
+| [ERCON-1 For Connections using ExpressRoute Direct circuits and UltraPerformance or ErGw3AZ ExpressRoute Gateways, enable FastPath to improve data path performance between your private cloud network and your virtual network](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-connection/#ercon-1---for-connections-using-expressroute-direct-circuits-and-ultraperformance-or-ergw3az-expressroute-gateways-enable-fastpath-to-improve-data-path-performance-between-your-on-premises-network-and-your-virtual-network) | Networking | High | Verified | No
+| [ERCON-2 Configure an Azure Resource Lock on connections to prevent accidental deletion](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-connection/#ercon-2---configure-an-azure-resource-lock-on-connections-to-prevent-accidental-deletion) | Availability | High | Verified | No
+| [ERGW-3 - Configure an Azure Resource lock for ExpressRoute Gateway to prevent accidental deletion](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-gateway/#ergw-3---configure-an-azure-resource-lock-for-expressroute-gateway-to-prevent-accidental-deletion) | Availability | Medium | Verified | No |
+| [ERGW-4 - Monitor gateway health](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-gateway/#ergw-4---monitor-gateway-health) | Monitoring | High | Verified | No |
+| [ERGW-7 - Configure customer-controlled gateway maintenance - In Preview](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-gateway/#ergw-7---configure-customer-controlled-gateway-maintenance---in-preview) | Networking | High | Verified | No |
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### AVS-1 - Configure Azure Service Health notifications and alerts for Azure VMware Solution
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Ensure Azure Service Health notifications and alerts are configured for the Azure VMware Solution service in the subscriptions and regions where Azure VMware Solution is deployed.
+
+Azure Service Health is the mechanism used to inform customers of any service or security issues affecting their private cloud deployment. Additionally, Azure Service Health is used to inform customers of maintenance activities in their Azure VMware Solution environments including host replacements, upgrades, and any service updates which could potentially impact customer operations. Proper configuration of Azure Service Health notifications and alerts ensures that customers receive relevant notifications and can reduce service request submissions due to Azure VMware Solution maintenance.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/azure-vmware/eslz-management-and-monitoring#design-recommendations)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-1/avs-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-2 - Configure Syslog in Diagnostic Settings for Azure VMware Solution
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Ensure Diagnostic Settings are configured for each private cloud to send the syslogs to one or more external sources for analysis and/or archiving.
+
+Azure VMware Solution Syslogs have useful data for troubleshooting and performance that can help with quicker issue resolution and can also enable early detection of some kinds of issues. Configure Diagnostic Settings on the private cloud to send the Syslogs to one or more external sources for querying and/or archiving in case of an audit.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/monitoring#manage-logs-and-archives)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-2/avs-2.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-3 - Configure Azure Monitor Alert warning thresholds for vSAN datastore utilization
+
+**Category: Monitoring**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Ensure storage utilization is monitored and alerts are configured so that VMware vSAN datastore slack space is maintained at the level the service-level agreement (SLA) mandates.
+
+For service-level agreement (SLA) purposes, Azure VMware Solution requires 25% slack space available on vSAN. vSAN storage utilization should be regularly monitored, and alerts should be configured at 70% utilization (30% slack space available on vSAN) and 75% utilization (25% slack space available on vSAN) to provide enough time for capacity planning.
+
+To expand the vSAN datastore, additional hosts can be added, up to the maximum supported cluster size (16 hosts). Note, you may need to request host quota. In addition, external storage can be added (e.g. Azure Elastic SAN, Azure NetApp Files, Pure Cloud Block Storage) if the CPU and RAM requirements are being met by the Azure VMware Solution cluster.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-alerts-for-azure-vmware-solution#supported-metrics-and-activities)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-3/avs-3.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-4 - Enable Stretched Clusters for Multi-AZ Availability of the vSAN Datastore
+
+**Category: Availability**
+
+**Impact: Low**
+
+**Recommendation/Guidance**
+
+If a Multi-AZ deployment of Azure VMware Solution is required, needs a financially backed SLA of 99.99%, or needs synchronous storage replication between AZs (RPO=0), then Azure VMware Solution Stretched Clusters should be considered. If you are in a region that supports stretched clusters, consider enabling this feature to spread the VMware vSAN datastore across two availability zones. Note: Configuring an Azure VMware Solution private cloud as a stretched cluster can only be done during initial implementation and requires twice the quota. This is due to a stretched cluster extending the cluster to the second availability zone.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/infrastructure#implement-high-availability)
+- [Stretched Clusters](https://learn.microsoft.com/en-us/azure/azure-vmware/deploy-vsan-stretched-clusters)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-4/avs-4.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-5 - Monitor CPU Utilization to ensure sufficient resources for workloads
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Ensure there are enough compute resources to avoid host resource exhaustion. Azure VMware Solution uses vSphere DRS and vSphere HA to manage workload resources dynamically. However, sustained host CPU utilization of over 95% can contribute to high CPU Ready times, which will impact running workloads.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/monitoring#configure-and-streamline-alerts)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-5/avs-5.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-6 - Monitor Memory Utilization to ensure sufficient resources for workloads
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Ensure there are enough memory resources to avoid host resource exhaustion. Azure VMware Solution uses vSphere DRS and vSphere HA to manage workload resources dynamically. However, sustained host memory utilization of over 95% can contribute to host memory swapping to disk, which will impact running workloads.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/monitoring#configure-and-streamline-alerts)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-6/avs-6.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-7 - Monitor when Azure VMware Solution Cluster Size is approaching the host limit
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Alert when the cluster size of 14 hosts is reached. Additionally, periodic alerts should be set up to indicate when growth, especially driven by storage requirements, necessitates planning for a new cluster or the addition of extra datastores. Furthermore, beyond the threshold of 14 hosts, alerts should be triggered each time a new host is added to the cluster, allowing proactive monitoring and management of resource utilization.
+
+**Resources**
+
+- [Learn More](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/monitoring#configure-and-streamline-alerts)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-7/avs-7.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-8 - Monitor when Azure VMware Solution Private Cloud is reaching the capacity limit
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Alert when the total node count is greater than or equal to 90 hosts so that it's clear when to start planning for a new private cloud.
+
+**Resources**
+
+- [Configure and streamline alerts](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/monitoring#configure-and-streamline-alerts)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-8/avs-8.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-9 - Apply Resource delete lock on the resource group hosting the private cloud
+
+**Category: Governance**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Anyone with contributor access to the resource group hosting Azure VMware Solution Private Cloud can delete it. Applying a resource delete lock to the Azure VMware Solution Private Cloud resource group to prevent deletion of the Azure VMware Solution Private Cloud.
+
+**Resources**
+
+- [Lock your resources to protect your infrastructure](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-9/avs-9.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-10 - Align ExpressRoute configuration with best practices for circuit resilience
+
+**Category: Networking**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+For critical workloads, Microsoft recommends deploying two (or more) ExpressRoute circuits in different ExpressRoute peering locations. Use Global Reach to connect multiple ExpressRoute circuits and your Azure VMware Solutions private clouds. Please review the APRL recommendations for ExpressRoute circuits in the Resources section below.
+
+**Resources**
+
+- [APRL guidance for ExpressRoute circuits](https://azure.github.io/Azure-Proactive-Resiliency-Library/services/networking/expressroute-circuits)
+- [Create a new ExpressRoute circuit](https://learn.microsoft.com/azure/expressroute/expressroute-howto-circuit-portal-resource-manager?pivots=expressroute-preview#create-a-new-expressroute-circuit-preview)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-10/avs-10.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+### AVS-11 - Deploy two or more circuits in different peering locations when using stretched clusters
+
+**Category: Networking**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Azure VMware Solution vSAN stretched clusters span two Availability Zones (AZs) in the region where they are deployed (plus a third AZ for the witness node). When using ExpressRoute to connect to the vSAN stretched clusters from on-premises, align the ExpressRoute implementation's resilience to the clusters’ resilience by deploying two circuits in different peering locations (i.e., different sites/DC facilities). When using Global Reach, implement a mesh topology by connecting the on-premises circuits to the managed circuits provided by the Azure VMware Solution private cloud.
+
+**Resources**
+
+- [Deploy vSAN streched cluster](https://learn.microsoft.com/en-us/azure/azure-vmware/deploy-vsan-stretched-clusters#deploy-a-stretched-cluster-private-cloud)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-11/avs-11.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+### AVS-12 - Deploy two Azure VMware Solution private clouds in different regions for geographical disaster recovery
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Two Azure VMware Solution private clouds can be deployed in different regions for business continuity. Implement a mesh network topology based on ExpressRoute Gateway Connections and Global Reach Connections.
+
+**Resources**
+
+- [Private Clouds in two regions](https://learn.microsoft.com/en-us/azure/azure-vmware/move-azure-vmware-solution-across-regions)
+- [Dual Region Network Topology](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/azure-vmware/eslz-dual-region-network-topology)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-12/avs-12.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+### AVS-13 - Use the AVS Interconnect feature to connect private clouds in different availability zones
+
+**Category: Availability**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Use the Interconnect feature for direct communication between private clouds in different availability zones, enabling connectivity between the private clouds management and workload networks. The IP address for each private cloud should be unique to avoid overlap, as the AVS Interconnect does not check for this.
+
+**Resources**
+
+- [Connect Private Clouds in the same region](https://learn.microsoft.com/en-us/azure/azure-vmware/connect-multiple-private-clouds-same-region)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-13/avs-13.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-14 - Use key autorotation for vSAN datastore customer-managed keys
+
+**Category: Storage**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+When using customer-managed keys to encrypt the vSAN datastore(s), use Azure Key Vault for centralized management and access them using a managed identity mapped to the private cloud. Key expiration can result in the vSAN datastore and its workloads becoming unavailable. Configure key autorotation to avoid unplanned outages due to key rotation not occurring before expiration.
+
+**Resources**
+
+- [Configure Customer Managed Keys](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-customer-managed-keys?tabs=azure-portal)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-14/avs-14.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-15 - Configure LDAPS Identity integration with two sources for NSX and vCenter Server management consoles
+
+**Category: Access and Security**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Ensure that two external identity sources are configured for NSX and vCenter Server. The VMware vCenter Server and NSX Manager use identity sources to enable authentication using external identities. These sources can be temporarily unavailable during maintenance times. Having two sources ensures that administrators can continue to log in to the control surfaces when one source becomes unavailable.
+
+**Resources**
+
+- [Set an external identity source for vCenter](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-identity-source-vcenter)
+- [Set an external identity for NSX-T](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-external-identity-source-nsx-t)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-15/avs-15.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-16 - Use HCX Network Extension High Availability
+
+**Category: Availability**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Enable Network Extension High Availability to provide appliance failure tolerance to the HCX Network Extension service. When Network Extension High Availability is enabled for a selected appliance, HCX will pair it with an eligible appliance and enable an Active Standby resiliency configuration. This enables highly available configurations that can remain in-service in the event of an unplanned appliance level failure. When either of the HA Actives fail, both standby appliances take over. The Network Extension High Availability is designed to recover within a few seconds after a single appliance has failed.
+
+**Resources**
+
+- [HCX Network extension high availability](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-hcx-network-extension-high-availability)
+- [Understanding Network Extension High Availability](https://docs.vmware.com/en/VMware-HCX/4.8/hcx-user-guide/GUID-E1353511-697A-44B0-82A0-852DB55F97D7.html)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-16/avs-16.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-17 - Verify Management Networks are not extended with HCX Network Extension
+
+**Category: Networking**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Do not extend the network on which the HCX Management devices are deployed.
+
+**Resources**
+
+- [Requirements for Network Extension](https://docs.vmware.com/en/VMware-HCX/4.8/hcx-user-guide/GUID-0C746416-850E-46F7-85DD-4D4326A23785.html)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-17/avs-17.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-18 - Use multiple DNS servers per private FQDN zone
+
+**Category: Networking**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+Azure VMware Solution private clouds can support upto three DNS servers for a single FQDN. Using a single DNS server for DNS resolution becomes single point of failure. Ensure that multiple DNS servers are used for any on-premises FQDN resolution from each Azure VMware Solution private cloud.
+
+**Resources**
+
+- [Configure DNS forwarder](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-dns-azure-vmware-solution#configure-dns-forwarder)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-18/avs-18.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### AVS-19 - Verify vSAN FTT configuration aligns with the cluster size
+
+**Category: Application Resilience**
+
+**Impact: High**
+
+**Recommendation/Guidance**
+
+The Azure VMware Solution service SLA also depends upon the vSAN storage policies configured, which vary depending upon the cluster size. In clusters with more than 6 hosts, the vSAN storage policy should be configured with an FTT-2 policy (RAID-1, or RAID-6). FTT stands for **failures to tolerate**, which in this case refers to how many hosts in a cluster can fail, beofre there is potential data or VM impact.
+
+The default storage policy is set to RAID-1 FTT-1, with Object Space Reservation set to Thin provisioning. Unless you adjust the storage policy or apply a new policy, the cluster grows with this configuration. Please note that the storage policy is not automatically updated based on cluster size. Similarly, changing the default does not automatically update the running VM policies.
+
+**Resources**
+
+- [Use fault domains](https://learn.microsoft.com/en-us/azure/well-architected/azure-vmware/application-platform#use-fault-domains)
+- [Configure storage policy](https://learn.microsoft.com/en-us/azure/azure-vmware/configure-storage-policy)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/avs-19/avs-19.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-1/avs-1.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-1/avs-1.kql
new file mode 100644
index 000000000..145a54b03
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-1/avs-1.kql
@@ -0,0 +1,39 @@
+// Azure Resource Graph Query
+// Provides a list of Azure VMware Solution resources that don't have one or more service health alerts covering AVS private clouds in the deployed subscription and region pairs.
+//full list of private clouds
+(resources
+| where ['type'] == "microsoft.avs/privateclouds"
+| extend locale = tolower(location)
+| extend subscriptionId = tolower(subscriptionId)
+| project id, name, tags, subscriptionId, locale)
+| join kind=leftouter
+//Alert ID's that include all incident types filtered by AVS Service Health alerts
+((resources
+| where type == "microsoft.insights/activitylogalerts"
+| extend alertproperties = todynamic(properties)
+| where alertproperties.condition.allOf[0].field == "category" and alertproperties.condition.allOf[0].equals == "ServiceHealth"
+| where alertproperties.condition.allOf[1].field == "properties.impactedServices[*].ServiceName" and set_has_element(alertproperties.condition.allOf[1].containsAny, "Azure VMware Solution")
+| extend locale = strcat_array(split(tolower(alertproperties.condition.allOf[2].containsAny),' '), '')
+| mv-expand todynamic(locale)
+| where locale != "global"
+| project subscriptionId, tostring(locale) )
+| union
+//Alert ID's that include only some of the incident types after filtering by service health alerts covering AVS private clouds.
+(resources
+| where type == "microsoft.insights/activitylogalerts"
+| extend subscriptionId = tolower(subscriptionId)
+| extend alertproperties = todynamic(properties)
+| where alertproperties.condition.allOf[0].field == "category" and alertproperties.condition.allOf[0].equals == "ServiceHealth"
+| where alertproperties.condition.allOf[2].field == "properties.impactedServices[*].ServiceName" and set_has_element(alertproperties.condition.allOf[2].containsAny, "Azure VMware Solution")
+| extend locale = strcat_array(split(tolower(alertproperties.condition.allOf[3].containsAny),' '), '')
+| mv-expand todynamic(locale)
+| mv-expand alertproperties.condition.allOf[1].anyOf
+| extend incidentType = alertproperties_condition_allOf_1_anyOf.equals
+| where locale != "global"
+| project id, subscriptionId, locale, incidentType
+| distinct subscriptionId, tostring(locale), tostring(incidentType)
+| summarize incidentTypes=count() by subscriptionId, locale
+| where incidentTypes == 5 //only include this subscription, region pair if it includes all the incident types.
+| project subscriptionId, locale)) on subscriptionId, locale
+| where subscriptionId1 == "" or locale1 == "" or isnull(subscriptionId1) or isnull(locale1)
+| project recommendationId = "avs-1", name, id, tags, param1 = "avsServiceHealthAlertsAllIncidentTypesConfigured: False"
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-10/avs-10.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-10/avs-10.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-10/avs-10.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-11/avs-11.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-11/avs-11.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-11/avs-11.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-12/avs-12.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-12/avs-12.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-12/avs-12.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-13/avs-13.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-13/avs-13.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-13/avs-13.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-14/avs-14.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-14/avs-14.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-14/avs-14.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-15/avs-15.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-15/avs-15.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-15/avs-15.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-16/avs-16.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-16/avs-16.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-16/avs-16.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-17/avs-17.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-17/avs-17.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-17/avs-17.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-18/avs-18.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-18/avs-18.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-18/avs-18.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-19/avs-19.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-19/avs-19.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-19/avs-19.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-2/avs-2.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-2/avs-2.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-2/avs-2.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-2/avs-2.ps1 b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-2/avs-2.ps1
new file mode 100644
index 000000000..ff7990a45
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-2/avs-2.ps1
@@ -0,0 +1,36 @@
+# Azure PowerShell script
+# Provides a list of private clouds that don't have a diagnostic setting configured to export the logs
+$output = @()
+$privateClouds = get-AzVMwarePrivateCloud
+foreach ($privateCloud in $privateClouds) {
+ $ds = Get-AzDiagnosticSetting -ResourceId $privateCloud.id
+
+ if (!$ds) {
+ $output += [PSCustomObject] @{ #no diagnostic settings values
+ recommendationId = 'aks-2'
+ name = $privateCloud.Name
+ id = $privateCloud.Id
+ tags = if ($privateCloud.tag) { $privateCloud.tag } else { $null }
+ param1 = 'exportSyslogWithDiagnosticSetting:false'
+ }
+ }
+
+ if ($ds) {
+ $fixFlag = $false
+ foreach ($log in $ds.log) { #diagnostic settings exist but not for logs
+ if (($log.CategoryGroup -eq "allLogs") -and ($log.Enabled -eq $false)){ $fixFlag = $true}
+ if (($log.Category -eq "vmwaresyslog") -and ($log.Enabled -eq $false)){ $fixFlag = $true}
+ }
+ if ($fixFlag){
+ $output += [PSCustomObject] @{
+ recommendationId = 'aks-2'
+ name = $privateCloud.Name
+ id = $privateCloud.Id
+ tags = if ($privateCloud.tag) { $privateCloud.tag } else { $null }
+ param1 = 'exportSyslogWithDiagnosticSetting:false'
+ }
+ }
+ }
+}
+
+$output
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-3/avs-3.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-3/avs-3.kql
new file mode 100644
index 000000000..ec5f31cef
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-3/avs-3.kql
@@ -0,0 +1,44 @@
+// Azure Resource Graph Query
+// Provides a list of Azure VMware Solution resources that don't have a vSAN capacity critical alert with a threshold of 75% or a warning capacity of 70%.
+(
+resources
+| where ['type'] == "microsoft.avs/privateclouds"
+| extend scopeId = tolower(tostring(id))
+| project ['scopeId'], name, id, tags
+| join kind=leftouter (
+resources
+| where type == "microsoft.insights/metricalerts"
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| extend scopeId = tolower(tostring(alertProperties_scopes))
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend threshold = alertProperties_criteria_allOf.threshold
+| project scopeId, tostring(metric), toint(['threshold'])
+| where metric == "DiskUsedPercentage"
+| where threshold == 75
+) on scopeId
+| where isnull(['threshold'])
+| project recommendationId = "avs-3", name, id, tags, param1 = "vsanCapacityCriticalAlert: isNull or threshold != 75"
+)
+| union (
+resources
+| where ['type'] == "microsoft.avs/privateclouds"
+| extend scopeId = tolower(tostring(id))
+| project ['scopeId'], name, id, tags
+| join kind=leftouter (
+resources
+| where type == "microsoft.insights/metricalerts"
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| extend scopeId = tolower(tostring(alertProperties_scopes))
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend threshold = alertProperties_criteria_allOf.threshold
+| project scopeId, tostring(metric), toint(['threshold'])
+| where metric == "DiskUsedPercentage"
+| where threshold == 70
+) on scopeId
+| where isnull(['threshold'])
+| project recommendationId = "avs-3", name, id, tags, param1 = "vsanCapacityWarningAlert: isNull or threshold != 70"
+)
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-4/avs-4.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-4/avs-4.kql
new file mode 100644
index 000000000..a90940399
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-4/avs-4.kql
@@ -0,0 +1,8 @@
+// Azure Resource Graph Query
+// Provides a list of Azure VMware Solution resources that aren't configured as stretched clusters and in supported regions.
+resources
+| where ['type'] == "microsoft.avs/privateclouds"
+| extend avsproperties = todynamic(properties)
+| where avsproperties.availability.strategy != "DualZone"
+| where location in ("uksouth", "westeurope", "germanywestcentral", "australiaeast")
+| project recommendationId = "avs-4", name, id, tags, param1 = "stretchClusters: Disabled"
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-5/avs-5.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-5/avs-5.kql
new file mode 100644
index 000000000..2ba8ac2ab
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-5/avs-5.kql
@@ -0,0 +1,21 @@
+// Azure Resource Graph Query
+// Provides a list of Azure VMware Solution resources that don't have a Cluster CPU capacity critical alert with a threshold of 95%.
+resources
+| where ['type'] == "microsoft.avs/privateclouds"
+| extend scopeId = tolower(tostring(id))
+| project ['scopeId'], name, id, tags
+| join kind=leftouter (
+resources
+| where type == "microsoft.insights/metricalerts"
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| extend scopeId = tolower(tostring(alertProperties_scopes))
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend threshold = alertProperties_criteria_allOf.threshold
+| project scopeId, tostring(metric), toint(['threshold'])
+| where metric == "EffectiveCpuAverage"
+| where threshold == 95
+) on scopeId
+| where isnull(['threshold'])
+| project recommendationId = "avs-5", name, id, tags, param1 = "hostCpuCriticalAlert: isNull or threshold != 95"
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-6/avs-6.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-6/avs-6.kql
new file mode 100644
index 000000000..f7b49f0d4
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-6/avs-6.kql
@@ -0,0 +1,21 @@
+// Azure Resource Graph Query
+// Provides a list of Azure VMware Solution resources that don't have a cluster host memory critical alert with a threshold of 95%.
+resources
+| where ['type'] == "microsoft.avs/privateclouds"
+| extend scopeId = tolower(tostring(id))
+| project ['scopeId'], name, id, tags
+| join kind=leftouter (
+resources
+| where type == "microsoft.insights/metricalerts"
+| extend alertProperties = todynamic(properties)
+| mv-expand alertProperties.scopes
+| mv-expand alertProperties.criteria.allOf
+| extend scopeId = tolower(tostring(alertProperties_scopes))
+| extend metric = alertProperties_criteria_allOf.metricName
+| extend threshold = alertProperties_criteria_allOf.threshold
+| project scopeId, tostring(metric), toint(['threshold'])
+| where metric == "UsageAverage"
+| where threshold == 95
+) on scopeId
+| where isnull(['threshold'])
+| project recommendationId = "avs-6", name, id, tags, param1 = "hostMemoryCriticalAlert: isNull or threshold != 95"
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-7/avs-7.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-7/avs-7.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-7/avs-7.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-8/avs-8.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-8/avs-8.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-8/avs-8.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-9/avs-9.kql b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-9/avs-9.kql
new file mode 100644
index 000000000..b5bc4080a
--- /dev/null
+++ b/docs/content/services/specialized-workloads/azure-vmware-solution/code/avs-9/avs-9.kql
@@ -0,0 +1 @@
+// cannot be validated with ARG
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/_index.md b/docs/content/services/specialized-workloads/sap-on-azure/_index.md
new file mode 100644
index 000000000..92c4315c5
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/_index.md
@@ -0,0 +1,551 @@
++++
+title = "SAP on Azure"
+description = "Best practices and resiliency recommendations for Azure Sap Solution and associated resources and settings."
+date = "2/13/24"
+author = "humblejay"
+msAuthor = "kupole"
+draft = false
++++
+
+The presented resiliency recommendations in this guidance include Azure SAP Solution and associated resources and settings.
+
+Refer to -
+- Azure Center for SAP Solutions
+- Opensource Quality Checks
+- Openssource Inventory Checks
+
+## Summary of Recommendations
+
+{{< table style="table-striped" >}}
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:--------------------------------------------------|:-----------------------------------------------------------------------:|:---------------:|:----------------:|:-------------------:|
+| [SAP-1 - Ensure that each SAP production system is designed for high availability across availability zones.](#sap-1---ensure-that-each-sap-production-system-is-designed-for-high-availability-across-availability-zones) | Availability | High | Verified | No |
+| [SAP-2 - Run SAP application servers on two or more VMs using VMSS Flex.](#sap-2---run-sap-application-servers-on-two-or-more-vms-using-vmss-flex) | Availability | High | Verified | Yes |
+| [SAP-9 - If using single-instance VMs, all OS and data disks must be Premium SSD or Ultra Disk.](#sap-9---if-using-single-instance-vms-all-os-and-data-disks-must-be-premium-ssd-or-ultra-disk) | Availability | High | Verified | Yes |
+| [SAP-14 - Ensure that each database replicates changes synchronously (SYNC mode) to a stand-by node.](#sap-14---ensure-that-the-data-is-replicated-synchronously-sync-mode-between-the-primary-and-secondary-database-hosting-vm-nodes) | Availability | High | Verified | No |
+| [SAP-15 - Ensure that SAP shared file systems are designed for high availability and when possible using availability zones.](#sap-15---ensure-that-sap-shared-file-systems-are-designed-for-high-availability-and-when-possible-using-availability-zones) | Availability | High | Verified | No |
+| [SAP-16 - Test high availability solutions thoroughly to ensure fail overs work as expected.](#sap-16---test-high-availability-solutions-thoroughly-to-ensure-fail-overs-work-as-expected) | Availability | High | Verified | No |
+| [SAP-18 - Remove unwanted location constraints from Linux Pacemaker clusters.](#sap-18---remove-unwanted-location-constraints-from-linux-pacemaker-clusters) | Availability | High | Verified | No |
+| [SAP-26 - Secure compute resource capacity for critical VM roles in DR region.](#sap-26---secure-compute-resource-capacity-for-critical-vm-roles-in-dr-region) | Disaster Recovery | Medium | Verified | No |
+| [SAP-27 - Ensure that the production databases are replicated (ASYNC) to DR location using the database vendor's replication technology.](#sap-27---ensure-that-the-production-databases-are-replicated-async-to-dr-location-using-the-database-vendors-replication-technology) | Disaster Recovery | High | Verified | No |
+| [SAP-28 - SAP components are backed up to DR location using an appropriate backup tool or ASR.](#sap-28---sap-components-are-backed-up-to-dr-location-using-an-appropriate-backup-tool-or-asr) | Disaster Recovery | High | Verified | No |
+| [SAP-29 - SAP shared files systems are replicated or backed up to DR location.](#sap-29---sap-shared-files-systems-are-replicated-or-backed-up-to-dr-location) | Disaster Recovery | High | Verified | No |
+| [SAP-32 - Automate DR infrastructure build or pre-deploy DR resources.](#sap-32---automate-dr-infrastructure-build-or-pre-deploy-dr-resources) | Disaster Recovery | Medium | Verified | No |
+| [SAP-33 - Document and test DR procedure ensure it meets RPO and RTO targets.](#sap-33---document-and-test-dr-procedure-ensure-it-meets-rpo-and-rto-targets) | Disaster Recovery | Medium | Verified | No |
+| [SAP-34 - Ensure there is a robust monitoring and alerting solution in place for the entire DR solution.](#sap-34---ensure-there-is-a-robust-monitoring-and-alerting-solution-in-place-for-the-entire-dr-solution) | Disaster Recovery | Medium | Verified | No |
+| [SAP-36 - Configure scheduled events notification](#sap-36---configure-scheduled-events-notification) | Monitor | High | Verified | No |
+| [SAP-42 - ASCS-Pacemaker (Central Server Instance) Ensure Pacemaker cluster has been setup for SAP ASCS high availability.](#sap-42---ascs-pacemaker-central-server-instance-ensure-pacemaker-cluster-has-been-setup-for-sap-ascs-high-availability) | Availability | High | Verified | No |
+| [SAP-45 - ASCS-LB (Central Server Instance) Ensure the load balancer is configured correctly for SAP ASCS High availability.](#sap-45---ascs-lb-central-server-instance-ensure-the-load-balancer-is-configured-correctly-for-sap-ascs-high-availability) | Availability | High | Verified | No |
+| [SAP-46 - DBHANA-Pacemaker (Database Instance) Ensure the Pacemaker cluster has been setup for SAP HANA DB high availability.](#sap-46---dbhana-pacemaker-database-instance-ensure-the-pacemaker-cluster-has-been-setup-for-sap-hana-db-high-availability) | Availability | High | Verified | No |
+| [SAP-49 - DBHANA-LB (Database Instance) Ensure the load balancer is configured correctly for SAP HANA DB High availability.](#sap-49---dbhana-lb-database-instance-ensure-the-load-balancer-is-configured-correctly-for-sap-hana-db-high-availability) | Availability | High | Verified | No |
+
+
+{{< /table >}}
+
+{{< alert style="info" >}}
+
+Definitions of states can be found [here]({{< ref "../../../_index.md#definitions-of-terms-used-in-aprl">}})
+
+{{< /alert >}}
+
+## Recommendations Details
+
+### SAP-1 - Ensure that each SAP production system is designed for high availability across availability zones
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Azure Availability Zones are physically separate locations within each Azure region that are tolerant to local failures. Use availability zones to protect your applications and data against unlikely data center failures. Ensure each single point of failure of each SAP production system is protected with high availability using multiple availability zones. If you cannot deploy across different zones in a region, then refer to Microsoft guidance for High availability deployment options for SAP workload.
+
+**Resources**
+
+- [SAP ACSS Quality Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Inventory Checks](https://aka.ms/ACESInventoryCheckSAP)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+- [Move Regional SAP HA to Zonal](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/Move-VM-from-AvSet-to-AvZone/Move-Regional-SAP-HA-To-Zonal-SAP-HA-WhitePaper)
+- [High Availability Deployment Options for SAP](https://learn.microsoft.com/en-us/azure/sap/workloads/sap-high-availability-architecture-scenarios#high-availability-deployment-options-for-sap-workload)
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-1/sap-1.kql">}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-2 - Run SAP application servers on two or more VMs using VMSS Flex
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Use Virtual Machines Scale Set (VMSS) with flexible orchestration to distribute the virtual machines across specified zones and within each zone to also distribute VMs across different fault domains within the zone on a best effort basis. Configure VMSS Flex following Microsoft recommendation for SAP workload using the right mode and correct settings. If you aren't currently using VMSS Flex for SAP application servers and also not using Availability Sets with Fault domain & Update domain distribution, then you should consider moving to VMSS Flex architecture to improve the resiliency posture of your SAP deployment. The following blog post in links below outlines the details on the process of migrating existing SAP workloads that are deployed in an availability set or availability zone to a flexible scale set with FD=1 deployment option.
+
+
+**Resources**
+
+- [OpenSource Inventory Checks](https://aka.ms/ACESInventoryCheckSAP)
+- [Virtual machine Scale Set SAP Deployment Guide](https://learn.microsoft.com/en-us/azure/sap/workloads/virtual-machine-scale-set-sap-deployment-guide)
+- [Considerations for Flexible VM Scale Sets for SAP](https://learn.microsoft.com/en-us/azure/sap/workloads/virtual-machine-scale-set-sap-deployment-guide?tabs=scaleset-cli#important-consideration-of-flexible-virtual-machine-scale-sets-for-sap-workload)
+- [Migrate existing SAP system VMs to VMSS Flex](https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/how-to-easily-migrate-an-existing-sap-system-vms-to-flexible/ba-p/3833548)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="../../compute/virtual-machines/code/vm-1/vm-1.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-9 - If using single-instance VMs all OS and data disks must be Premium SSD or Ultra Disk
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+For single-instance VMs, both OS and data disks must be either Premium SSD or Ultra Disk to achieve the single-instance SLA of 99.9% availability.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Inventory Checks](https://aka.ms/ACESInventoryCheckSAP)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+- [VM SLA](https://www.azure.cn/en-us/support/sla/virtual-machines/)
+- [SAP Storage Planning Guide](https://learn.microsoft.com/en-us/azure/sap/workloads/planning-guide-storage)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="../../compute/virtual-machines/code/vm-24/vm-24.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-14 - Ensure that the data is replicated synchronously (SYNC mode) between the primary and secondary database hosting VM nodes
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+High availability for databases should be implemented using database native replication technologies and the data should be replicated synchronously that is in SYNC mode from primary database to a stand-by node.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-14/sap-14.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-15 - Ensure that SAP shared file systems are designed for high availability and when possible using availability zones
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+SAP shared file systems such as /sapmnt, /usr/sap/trans, interfaces should be made highly available.
+
+In case of Azure File Shares, we recommend that you use ZRS (Zone-redundant storage).
+In case of Azure NetApp Files, we recommend that you use Zonal replication for your volumes.
+
+You should review the results of individual checks on other Azure services to ensure SAP shared file systems are designed to protect from zonal failure: ST-1, ANF-1, ANF-6
+
+**Resources**
+
+- [OpenSource Inventory Checks](https://aka.ms/ACESInventoryCheckSAP)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-15/sap-15.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-16 - Test high availability solutions thoroughly to ensure fail overs work as expected
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+Test all high availability solutions thoroughly (including kernel panic in Linux VMs and also fail-back). Include zonal failure scenarios in your testing, the testing should confirm that each layer of your SAP solution including database, central services, application servers and shared file systems is configured correctly for zone redundancy, the solution meets RPO = 0 and the application fails over automatically meeting your RTO.
+The fail back can be either automatic or manual.
+
+**Resources**
+
+- [Test Cases](https://learn.microsoft.com/en-us/azure/sap/workloads/sap-hana-high-availability?tabs=lb-portal#test-the-cluster-setup)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-16/sap-16.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-18 - Remove unwanted location constraints from Linux Pacemaker clusters
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+When executing a migrate command in a Linux Pacemaker cluster, the system generates a temporary "prefer" location constraint, aiming to move a resource to a specified node. This constraint prioritizes the target node for the resource temporarily without permanently altering the cluster’s configuration.
+
+During planned maintenances and fail over testing, you can leverage the migrate command for temporary resource relocation during maintenance or administrative tasks to ensure minimal disruption. This constraint is not permanent and does not survive reboots or cluster resets. It's designed for short-term adjustments.
+
+Once the planned task necessitating the resource migration is complete, manually remove the temporary constraint to revert to the cluster's original resource management policies.
+This approach allows for controlled resource movement within the cluster, facilitating maintenance while preserving the integrity and efficiency of the cluster's configuration.
+
+**Resources**
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-18/sap-18.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-26 - Secure compute resource capacity for critical VM roles in DR region
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance**
+
+To ensure the availability of compute resources for critical VM roles in a DR region, consider securing capacity either through a warm standby approach or by utilizing Azure's On-demand Capacity Reservation.
+
+Warm standby involves keeping VMs in the DR region running. On-demand Capacity Reservation, on the other hand, reserves compute capacity without having to run the VMs, allowing you to start them when needed. When DR VMs are not needed, the reserved capacity may safely be used to run other workloads without the risk of losing the capacity to other customers. This strategy guarantees resource availability for your critical workloads in the event of a disaster, balancing cost and readiness.
+
+**Resources**
+
+- [Capacity Reservation](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-26/sap-26.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-27 - Ensure that the production databases are replicated (ASYNC) to DR location using the database vendor's replication technology
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Guidance**
+
+The replication of production databases to a DR location using the database vendor's asynchronous replication technology is a key strategy in ensuring data availability and business continuity.
+
+**Resources**
+
+- [SAP Disaster Recovery Guide](https://learn.microsoft.com/en-us/azure/sap/workloads/disaster-recovery-sap-guide?tabs=windows)
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-27/sap-27.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-28 - SAP components are backed up to DR location using an appropriate backup tool or ASR
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Guidance**
+
+SAP components such as (A)SCS, application servers, WebDispatchers, etc are backed up to DR location using an appropriate backup tool or ASR.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Inventory Checks](https://aka.ms/ACESInventoryCheckSAP)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-28/sap-28.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-29 - SAP shared files systems are replicated or backed up to DR location
+
+**Category: Disaster Recovery**
+
+**Impact: High**
+
+**Guidance**
+
+Ensure that critical SAP shared file systems, such as /sapmnt, /usr/trans and /interfaces are either replicated or backed up for disaster recovery purposes.
+
+**Resources**
+
+- [DR Guidance](https://learn.microsoft.com/en-us/azure/sap/workloads/disaster-recovery-sap-guide?tabs=windows)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-29/sap-29.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-32 - Automate DR infrastructure build or pre-deploy DR resources
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance**
+
+Automate DR infrastructure build (or have pre-deployed DR resources) and SAP service recovery as much as possible.
+
+**Resources**
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-32/sap-32.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-33 - Document and test DR procedure ensure it meets RPO and RTO targets
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance**
+
+Create detailed documentation of your DR procedures for each layer of the SAP architecture—database, central services, application servers, and shared file systems. This documentation should include configuration details, failover mechanisms, and step-by-step recovery procedures.
+
+Test a wide range of failure scenarios, including regional outages. Testing should confirm that your DR strategy is robust, meets your RPO and RTO targets, and provides seamless failover across all layers of the SAP architecture.
+
+This will ensure a comprehensive and resilient DR strategy capable of withstanding regional failures and ensuring business continuity.
+
+**Resources**
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-33/sap-33.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-34 - Ensure there is a robust monitoring and alerting solution in place for the entire DR solution
+
+**Category: Disaster Recovery**
+
+**Impact: Medium**
+
+**Guidance**
+
+For an SAP solution hosted on Azure, it's imperative to implement a robust monitoring and alerting solution that comprehensively covers DR of each layer of the SAP architecture. Given the complexity of SAP systems, which span multiple layers using diverse technologies and Azure resources, each with potentially distinct DR replication mechanisms, an appropriate monitoring strategy is crucial. The different layers include database, central services, application, and shared file systems.
+
+**Resources**
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-34/sap-34.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-36 - Configure scheduled events notification
+
+**Category: Monitor**
+
+**Impact: High**
+
+**Guidance**
+
+Scheduled events is an Azure Metadata Services that provides proactive notifications about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. You should configure scheduled events for all your critical Azure VMs.
+Resource agent azure-events-az can also integrate with Pacemaker clusters.
+
+To ensure high availability and service continuity in your Azure VMs, you should configure the azure-events-az resource agent within your Pacemaker clusters. This agent monitors for scheduled Azure maintenance events and can proactively relocate resources for a graceful node shutdown. Configure the agent to monitor specific event types such as Reboot and Redeploy, and enable verbose logging for detailed diagnostics.
+
+In addition, it is also important that you define a procedure on how to react to scheduled events.
+
+**Resources**
+
+- [VM Scheduled Events](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events)
+- [Configure Pacemaker for Azure Scheduled Events](https://learn.microsoft.com/en-us/azure/sap/workloads/high-availability-guide-suse-pacemaker?tabs=msi#configure-pacemaker-for-azure-scheduled-events)
+
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-36/sap-36.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-42 - ASCS-Pacemaker (Central Server Instance) Ensure Pacemaker cluster has been setup for SAP ASCS high availability
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+For the ASCS-Pacemaker (Central Server Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP ASCS high availability.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+- [ASCS-Pacemaker - Central Server Instance](https://docs.microsoft.com/en-us/azure/advisor/advisor-reference-reliability-recommendations)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-42/sap-42.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-45 - ASCS-LB (Central Server Instance) Ensure the load balancer is configured correctly for SAP ASCS High availability
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+For the ASCS-LB (Central Server Instance), ensure that the load balancer is configured correctly for SAP ASCS high availability.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+- [ASCS-LB - Central Server Instance](https://docs.microsoft.com/en-us/azure/advisor/advisor-reference-reliability-recommendations)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-45/sap-45.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-46 - DBHANA-Pacemaker (Database Instance) Ensure the Pacemaker cluster has been setup for SAP HANA DB high availability
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+For the DBHANA-Pacemaker (Database Instance), ensure that the Pacemaker cluster configuration parameters are correctly set up for SAP HANA DB high availability.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+- [DBHANA-Pacemaker - Database Instance](https://docs.microsoft.com/en-us/azure/advisor/advisor-reference-reliability-recommendations)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-46/sap-46.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### SAP-49 - DBHANA-LB (Database Instance) Ensure the load balancer is configured correctly for SAP HANA DB High availability
+
+**Category: Availability**
+
+**Impact: High**
+
+**Guidance**
+
+For the DBHANA-LB (Database Instance), make sure the load balancer is configured correctly for SAP HANA DB high availability.
+
+**Resources**
+
+- [SAP ACSS Insights](https://learn.microsoft.com/en-us/azure/sap/center-sap-solutions/get-quality-checks-insights)
+- [OpenSource Quality Checks](https://github.com/Azure/SAP-on-Azure-Scripts-and-Utilities/tree/main/QualityCheck)
+- [DBHANA-LB- Database Instance](https://docs.microsoft.com/en-us/azure/advisor/advisor-reference-reliability-recommendations)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/sap-49/sap-49.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-1/sap-1.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-1/sap-1.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-1/sap-1.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-14/sap-14.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-14/sap-14.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-14/sap-14.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-15/sap-15.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-15/sap-15.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-15/sap-15.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-16/sap-16.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-16/sap-16.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-16/sap-16.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-17/sap-17.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-17/sap-17.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-17/sap-17.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-18/sap-18.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-18/sap-18.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-18/sap-18.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-2/sap-2.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-2/sap-2.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-2/sap-2.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-26/sap-26.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-26/sap-26.kql
new file mode 100644
index 000000000..825659376
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-26/sap-26.kql
@@ -0,0 +1,2 @@
+// under-development
+
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-27/sap-27.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-27/sap-27.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-27/sap-27.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-28/sap-28.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-28/sap-28.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-28/sap-28.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-29/sap-29.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-29/sap-29.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-29/sap-29.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-32/sap-32.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-32/sap-32.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-32/sap-32.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-33/sap-33.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-33/sap-33.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-33/sap-33.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-34/sap-34.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-34/sap-34.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-34/sap-34.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-36/sap-36.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-36/sap-36.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-36/sap-36.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-42/sap-42.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-42/sap-42.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-42/sap-42.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-45/sap-45.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-45/sap-45.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-45/sap-45.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-46/sap-46.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-46/sap-46.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-46/sap-46.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-49/sap-49.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-49/sap-49.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-49/sap-49.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/specialized-workloads/sap-on-azure/code/sap-9/sap-9.kql b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-9/sap-9.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/specialized-workloads/sap-on-azure/code/sap-9/sap-9.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/storage/azure-netapp-files/_index.md b/docs/content/services/storage/azure-netapp-files/_index.md
index 64123f79c..0037cd98e 100644
--- a/docs/content/services/storage/azure-netapp-files/_index.md
+++ b/docs/content/services/storage/azure-netapp-files/_index.md
@@ -1,27 +1,31 @@
+++
-title = "Azure Netapp Files"
-description = "Best practices and resiliency recommendations for Azure Netapp Files and associated resources and settings."
-date = "8/30/23"
-author = "maheshbenke"
-msAuthor = "maheshbenke"
+title = "Azure NetApp Files"
+description = "Best practices and resiliency recommendations for Azure NetApp Files and associated resources and settings."
+date = "3/26/24"
+author = "seanluce"
+msAuthor = "b-sluce"
draft = false
+++
-The presented resiliency recommendations in this guidance include Azure Netapp Files and associated resources and settings.
+The presented resiliency recommendations in this guidance include Azure NetApp Files and associated resources and settings.
## Summary of Recommendations
{{< table style="table-striped" >}}
| Recommendation | Category | Impact | State | ARG Query Available |
| :------------------------------------------------ | :---------------------------------------------------------------------: | :------: | :------: | :-----------------: |
-| [ANF-1 - Use the correct service level and volume quota size for the expected performance level](#anf-1---use-the-correct-service-level-and-volume-quota-size-for-the-expected-performance-level) | System Efficiency | High | Preview | No |
-| [ANF-2 - Use standard network feature for Production in Azure NetApp Files](#anf-2---use-standard-network-feature-for-production-in-azure-netapp-files) | Networking | High | Preview | Yes |
-| [ANF-3 - Use availability zones for high availability in Azure NetApp Files](#anf-3---use-availability-zones-for-high-availability-in-azure-netapp-files) | High Availability | High | Preview | Yes |
-| [ANF-4 - Use snapshot and backup for in-region data protection in Azure NetApp Files](#anf-4---use-snapshot-and-backup-for-in-region-data-protection-in-azure-netapp-files) | High Availability | High | Preview | No |
-| [ANF-5 - Enable Cross-region replication of Azure NetApp Files volumes](#anf-5---enable-cross-region-replication-of-azure-netapp-files-volumes) | Disaster Recovery/High Availability | High | Preview | Yes |
-| [ANF-6 - Enable Cross-zone replication of Azure NetApp Files volumes](#anf-6---enable-cross-zone-replication-of-azure-netapp-files-volumes) | Disaster Recovery/High Availability | High | Preview | Yes |
-| [ANF-7 - Monitor Azure Netapp Files metrics to better understand usage pattern and performance](#anf-7---monitor-azure-netapp-files-metrics-to-better-understand-usage-pattern-and-performance) | Monitoring | High | Preview | No |
-| [ANF-8 - Use Azure policy to enforce organizational standards and to assess compliance at-scale in Azure NetApp Files](#anf-8---use-azure-policy-to-enforce-organizational-standards-and-to-assess-compliance-at-scale-in-azure-netapp-files) | Governance | High | Preview | No |
+| [ANF-1 - Use the correct service level and volume quota size for the expected performance level](#anf-1---use-the-correct-service-level-and-volume-quota-size-for-the-expected-performance-level) | System Efficiency | Medium | Verified | No |
+| [ANF-2 - Use standard network features for production in Azure NetApp Files](#anf-2---use-standard-network-features-for-production-in-azure-netapp-files) | Networking | High | Verified | Yes |
+| [ANF-3 - Use availability zones for high availability in Azure NetApp Files](#anf-3---use-availability-zones-for-high-availability-in-azure-netapp-files) | Availability | High | Verified | Yes |
+| [ANF-4 - Use snapshots for data protection in Azure NetApp Files](#anf-4---use-snapshots-for-data-protection-in-azure-netapp-files) | Availability | High | Verified | Yes |
+| [ANF-5 - Enable backup for data protection in Azure NetApp Files](#anf-5---enable-backup-for-data-protection-in-azure-netapp-files) | Disaster Recovery | High | Verified | Yes |
+| [ANF-6 - Enable Cross-region replication of Azure NetApp Files volumes](#anf-6---enable-cross-region-replication-of-azure-netapp-files-volumes) | Disaster Recovery | High | Verified | Yes |
+| [ANF-7 - Enable Cross-zone replication of Azure NetApp Files volumes](#anf-7---enable-cross-zone-replication-of-azure-netapp-files-volumes) | Availability | High | Verified | Yes |
+| [ANF-8 - Monitor Azure NetApp Files metrics to better understand usage pattern and performance](#anf-8---monitor-azure-netapp-files-metrics-to-better-understand-usage-pattern-and-performance) | Monitoring | Medium | Verified | No |
+| [ANF-9 - Use Azure policy to enforce organizational standards and to assess compliance at-scale in Azure NetApp Files](#anf-9---use-azure-policy-to-enforce-organizational-standards-and-to-assess-compliance-at-scale-in-azure-netapp-files) | Governance | Medium | Verified | No |
+| [ANF-10 - Restrict default access to Azure NetApp Files volumes](#anf-10---restrict-default-access-to-azure-netapp-files-volumes) | Access & Security | Medium | Verified | No |
+| [ANF-11 - Make use of SMB continuous availability for supported applications](#anf-11---make-use-of-smb-continuous-availability-for-supported-applications) | Application Resilience | Medium | Verified | No |
+| [ANF-12 - Ensure application resilience for service maintenance events](#anf-12---ensure-application-resilience-for-service-maintenance-events) | Application Resilience | Medium | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -36,7 +40,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
**Category: System Efficiency**
-**Impact: High**
+**Impact: Medium**
**Guidance**
@@ -48,9 +52,9 @@ Service levels are an attribute of a capacity pool. Service levels are defined a
**Resources**
-- [Service levels for Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/azure-netapp-files-service-levels)
+- [Service levels for Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/azure-netapp-files-service-levels)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -60,7 +64,7 @@ Service levels are an attribute of a capacity pool. Service levels are defined a
-### ANF-2 - Use standard network feature for Production in Azure NetApp Files
+### ANF-2 - Use standard network features for production in Azure NetApp Files
**Category: Networking**
@@ -69,13 +73,12 @@ Service levels are an attribute of a capacity pool. Service levels are defined a
**Guidance**
Standard network feature enables higher IP limits and standard VNet features such as network security groups and user-defined routes on delegated subnets, and additional connectivity patterns.
-Please check the supported regions for standard network feature [here](https://docs.microsoft.com/en-us/azure/azure-netapp-files/azure-netapp-files-network-topologies#supported-regions-for-standard-network-feature)
**Resources**
-- [Guidelines for Azure NetApp Files network planning | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/azure-netapp-files-network-topologies)
+- [Guidelines for Azure NetApp Files network planning | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/azure-netapp-files-network-topologies)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -87,7 +90,7 @@ Please check the supported regions for standard network feature [here](https://d
### ANF-3 - Use availability zones for high availability in Azure NetApp Files
-**Category: High Availability**
+**Category: Availability**
**Impact: High**
@@ -97,9 +100,9 @@ Azure availability zones are physically separate locations within each suppo
**Resources**
-- [Use availability zones for high availability in Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/use-availability-zones)
+- [Use availability zones for high availability in Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/use-availability-zones)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -109,23 +112,21 @@ Azure availability zones are physically separate locations within each suppo
-### ANF-4 - Use snapshot and backup for in-region data protection in Azure NetApp Files
+### ANF-4 - Use snapshots for data protection in Azure NetApp Files
-**Category: High Availability**
+**Category: Availability**
**Impact: High**
**Guidance**
-Azure NetApp Files snapshot technology delivers stability, scalability, and swift recoverability without impacting performance.
-Azure NetApp Files supports a fully managed backup solution for long-term recovery, archive, and compliance. Backups can be restored to new volumes in the same region as the backup. Backups created by Azure NetApp Files are stored in Azure storage, independent of volume snapshots that are available for near-term recovery or cloning.
+Azure NetApp Files snapshot technology delivers stability, scalability, and swift recoverability without impacting performance. Use snapshot policies to automatically create snapshots of your Azure NetApp Files data.
**Resources**
-- [Snapshots](https://learn.microsoft.com/en-us/azure/azure-netapp-files/data-protection-disaster-recovery-options#snapshots)
-- [Backup](https://learn.microsoft.com/en-us/azure/azure-netapp-files/data-protection-disaster-recovery-options#backups)
+- [How Azure NetApp Files snapshots work | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/snapshots-introduction)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -135,21 +136,21 @@ Azure NetApp Files supports a fully managed backup solution for long-term recove
-### ANF-5 - Enable Cross-region replication of Azure NetApp Files volumes
+### ANF-5 - Enable backup for data protection in Azure NetApp Files
-**Category: Disaster Recovery/High Availability**
+**Category: Availability**
**Impact: High**
**Guidance**
-The Azure NetApp Files replication functionality provides data protection through cross-region volume replication. You can asynchronously replicate data from an Azure NetApp Files volume (source) in one region to another Azure NetApp Files volume (destination) in another region. This capability enables you to fail over your critical application if a region-wide outage or disaster happens.
+Azure NetApp Files supports a fully managed backup solution for long-term recovery, archive, and compliance. Backups can be restored to new volumes in the same region as the backup. Backups created by Azure NetApp Files are stored in Azure storage, independent of volume snapshots that are available for near-term recovery or cloning. Use backup policies to create backups of your Azure NetApp Files data automatically.
**Resources**
-- [Cross-zone replication of Azure NetApp Files volumes | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/cross-region-replication-introduction)
+- [Understand Azure NetApp Files backup | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/backup-introduction)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -159,21 +160,23 @@ The Azure NetApp Files replication functionality provides data protection throug
-### ANF-6 - Enable Cross-zone replication of Azure NetApp Files volumes
+### ANF-6 - Enable Cross-region replication of Azure NetApp Files volumes
-**Category: Disaster Recovery/High Availability**
+**Category: Disaster Recovery**
**Impact: High**
**Guidance**
-The cross-zone replication (CZR) capability provides data protection between volumes in different availability zones. You can asynchronously replicate data from an Azure NetApp Files volume (source) in one availability zone to another Azure NetApp Files volume (destination) in another availability. This capability enables you to fail over your critical application if a zone-wide outage or disaster happens.
+The Azure NetApp Files replication functionality provides data protection through cross-region volume replication. You can asynchronously replicate data from an Azure NetApp Files volume (source) in one region to another Azure NetApp Files volume (destination) in another region. This capability enables you to fail over your critical application if a region-wide outage or disaster happens.
+
+Note: A volume can be replicated via cross-zone replication (CZR) or cross-region replication (CRR) but not both concurrently.
**Resources**
-- [Cross-zone replication of Azure NetApp Files volumes | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/cross-zone-replication-introduction)
+- [Cross-region replication of Azure NetApp Files volumes | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/cross-region-replication-introduction)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -183,21 +186,23 @@ The cross-zone replication (CZR) capability provides data protection between vol
-### ANF-7 - Monitor Azure Netapp Files metrics to better understand usage pattern and performance
+### ANF-7 - Enable Cross-zone replication of Azure NetApp Files volumes
-**Category: Monitoring**
+**Category: Availability**
**Impact: High**
**Guidance**
-Azure NetApp Files provides metrics on allocated storage, actual storage usage, volume IOPS, and latency. With these metrics, you can gain a better understanding on the usage pattern and volume performance of your NetApp accounts.
+The cross-zone replication (CZR) capability provides data protection between volumes in different availability zones. You can asynchronously replicate data from an Azure NetApp Files volume (source) in one availability zone to another Azure NetApp Files volume (destination) in another availability. This capability enables you to fail over your critical application if a zone-wide outage or disaster happens.
+
+Note: A volume can be replicated via cross-zone replication (CZR) or cross-region replication (CRR) but not both concurrently.
**Resources**
-- [Ways to monitor Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/monitor-azure-netapp-files)
+- [Cross-zone replication of Azure NetApp Files volumes | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/cross-zone-replication-introduction)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -207,25 +212,137 @@ Azure NetApp Files provides metrics on allocated storage, actual storage usage,
-### ANF-8 - Use Azure policy to enforce organizational standards and to assess compliance at-scale in Azure NetApp Files
+### ANF-8 - Monitor Azure NetApp Files metrics to better understand usage pattern and performance
+
+**Category: Monitoring**
+
+**Impact: Medium**
+
+**Guidance**
+
+Azure NetApp Files provides metrics on allocated storage, actual storage usage, volume IOPS, and latency. With these metrics, you can gain a better understanding on the usage pattern and volume performance of your NetApp accounts.
+
+**Resources**
+
+- [Ways to monitor Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/monitor-azure-netapp-files)
+
+**Resource Graph Query**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/anf-8/anf-8.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ANF-9 - Use Azure policy to enforce organizational standards and to assess compliance at-scale in Azure NetApp Files
**Category: Governance**
-**Impact: High**
+**Impact: Medium**
**Guidance**
-Azure NetApp Files supports Azure Policy. You can integrate Azure NetApp Files with Azure Policy through [creating custom policy definitions](https://learn.microsoft.com/en-us/azure/governance/policy/tutorials/create-custom-policy-definition). You can find examples in [Enforce Snapshot Policies with Azure Policy](https://anfcommunity.com/2021/08/30/enforce-snapshot-policies-with-azure-policy/) and [Azure Policy now available for Azure NetApp Files](https://anfcommunity.com/2021/04/19/azure-policy-now-available-for-azure-netapp-files/).
+Azure NetApp Files supports Azure policy. You can integrate Azure NetApp Files with Azure policy by using built-in policy definitions or by creating custom policy definitions.
**Resources**
-- [Azure Policy definitions for Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-netapp-files/azure-policy-definitions)
+- [Azure Policy definitions for Azure NetApp Files | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/azure-policy-definitions)
+- [Creating custom policy definitions | Microsoft Learn](https://learn.microsoft.com/azure/governance/policy/tutorials/create-custom-policy-definition)
**Resource Graph Query/Scripts**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/anf-8/anf-8.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/anf-9/anf-9.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ANF-10 - Restrict default access to Azure NetApp Files volumes
+
+**Category: Access & Security**
+
+**Impact: Medium**
+
+**Guidance**
+
+Access to the delegated subnet should be granted to specific Azure Virtual Networks only whenever possible.
+Share permissions on SMB-enabled volumes should be restricted from the default 'Everyone – Full control'.
+Access to NFS-enabled volumes should be restricted by using export policies and/or NFSv4.1 ACLs.
+Mount path change permissions should be further restricted.
+
+
+**Resources**
+
+- [Configure network features for an Azure NetApp Files volume](https://learn.microsoft.com/azure/azure-netapp-files/configure-network-features)
+- [Manage SMB share ACLs in Azure NetApp Files](https://learn.microsoft.com/azure/azure-netapp-files/manage-smb-share-access-control-lists)
+- [Configure export policy for NFS or dual-protocol volumes](https://learn.microsoft.com/azure/azure-netapp-files/azure-netapp-files-configure-export-policy)
+- [Configure access control lists on NFSv4.1 volumes for Azure NetApp Files](https://learn.microsoft.com/azure/azure-netapp-files/configure-access-control-lists)
+- [Configure Unix permissions and change ownership mode for NFS and dual-protocol volumes](https://learn.microsoft.com/azure/azure-netapp-files/configure-unix-permissions-change-ownership-mode)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/anf-10/anf-10.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ANF-11 - Make use of SMB continuous availability for supported applications
+
+**Category: Application Resilience**
+
+**Impact: Medium**
+
+**Guidance**
+
+Certain SMB-based applications require SMB Transparent Failover. SMB Transparent Failover enables maintenance operations on the Azure NetApp Files service without interrupting connectivity to server applications storing and accessing data on SMB volumes. To support SMB Transparent Failover for specific applications, Azure NetApp Files supports the SMB Continuous Availability shares option.
+
+Consider using the Continuous Availability option for the following SMB-based applications:
+- Citrix App Layering
+- FSLogix user profile containers
+- FSLogix ODFC containers
+- Microsoft SQL Server
+- MSIX app attach
+
+**Resources**
+
+- [Do I need to take special precautions for SMB-based applications? | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/faq-application-resilience#do-i-need-to-take-special-precautions-for-smb-based-applications)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/anf-11/anf-11.kql" >}} {{< /code >}}
+
+{{< /collapse >}}
+
+
+
+### ANF-12 - Ensure application resilience for service maintenance events
+
+**Category: Application Resilience**
+
+**Impact: Medium**
+
+**Guidance**
+
+Azure NetApp Files might undergo occasional planned maintenance (for example, platform updates, service or software upgrades). As such, ensure that you're aware of the application’s resiliency settings to cope with the storage service maintenance events.
+
+**Resources**
+
+- [What do you recommend for handling potential application disruptions due to storage service maintenance events? | Microsoft Learn](https://learn.microsoft.com/azure/azure-netapp-files/faq-application-resilience#what-do-you-recommend-for-handling-potential-application-disruptions-due-to-storage-service-maintenance-events)
+
+**Resource Graph Query/Scripts**
+
+{{< collapse title="Show/Hide Query/Script" >}}
+
+{{< code lang="sql" file="code/anf-12/anf-12.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-10/anf-10.kql b/docs/content/services/storage/azure-netapp-files/code/anf-10/anf-10.kql
new file mode 100644
index 000000000..fa5cad258
--- /dev/null
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-10/anf-10.kql
@@ -0,0 +1 @@
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-11/anf-11.kql b/docs/content/services/storage/azure-netapp-files/code/anf-11/anf-11.kql
new file mode 100644
index 000000000..fa5cad258
--- /dev/null
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-11/anf-11.kql
@@ -0,0 +1 @@
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-12/anf-12.kql b/docs/content/services/storage/azure-netapp-files/code/anf-12/anf-12.kql
new file mode 100644
index 000000000..fa5cad258
--- /dev/null
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-12/anf-12.kql
@@ -0,0 +1 @@
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-2/anf-2.kql b/docs/content/services/storage/azure-netapp-files/code/anf-2/anf-2.kql
index 47e9bd63d..906b9091b 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-2/anf-2.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-2/anf-2.kql
@@ -1,4 +1,4 @@
-// This Resource Graph query will return all NetApp Volumes without Network Feature Standard.
+// This Resource Graph query will return all Azure NetApp Files volumes without standard network features.
resources
| where type =~ "microsoft.netapp/netappaccounts/capacitypools/volumes"
| where properties.networkFeatures != "Standard"
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-3/anf-3.kql b/docs/content/services/storage/azure-netapp-files/code/anf-3/anf-3.kql
index 059fb715e..465dc0bff 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-3/anf-3.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-3/anf-3.kql
@@ -1,5 +1,6 @@
-// This Resource Graph query will return all NetApp Volumes without AVzone defined.
-resources
-| where type =~ "microsoft.netapp/netappaccounts/capacitypools/volumes"
-| where zones == "[]"
-| project recommendationId = "ANF-3", name, id, tags
+// Azure Resource Graph Query
+// This Resource Graph query will return all Azure NetApp Files volumes without an availability zone defined.
+Resources
+| where type =~ "Microsoft.NetApp/netAppAccounts/capacityPools/volumes"
+| where array_length(zones) == 0 or isnull(zones)
+| project recommendationId = "anf-3", name, id, tags
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-4/anf-4.kql b/docs/content/services/storage/azure-netapp-files/code/anf-4/anf-4.kql
index 614a7f9ca..ec6b55ec6 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-4/anf-4.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-4/anf-4.kql
@@ -1 +1,5 @@
-// under-development
+// This Resource Graph query will return all Azure NetApp Files volumes without a snapshot policy defined.
+resources
+| where type == "microsoft.netapp/netappaccounts/capacitypools/volumes"
+| where properties.dataProtection.snapshot.snapshotPolicyId == ""
+| project recommendationId = "ANF-4", name, id, tags
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-5/anf-5.kql b/docs/content/services/storage/azure-netapp-files/code/anf-5/anf-5.kql
index e570c5507..536374bd0 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-5/anf-5.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-5/anf-5.kql
@@ -1,10 +1,5 @@
-// This Resource Graph query will return all NetApp Volumes without Cross-Region Replication.
+// This Resource Graph query will return all Azure NetApp Files volumes without a backup policy defined.
resources
-| where type =~ "microsoft.netapp/netappaccounts/capacitypools/volumes"
-| extend NetAC0 = tostring(split(name,'/')[0])
-| join kind=leftouter (resources
- | where type =~ "microsoft.netapp/netappaccounts/capacitypools/volumes"
- | extend NetAC1 = tostring(split(name,'/')[0])
- | project id,NetAC1,remid=tostring(properties.dataProtection.replication.remoteVolumeResourceId)) on $left.id == $right.remid
-| where properties.volumeType != 'DataProtection' and NetAC0 == NetAC1
+| where type == "microsoft.netapp/netappaccounts/capacitypools/volumes"
+| where properties.dataProtection.backup.backupPolicyId == ""
| project recommendationId = "ANF-5", name, id, tags
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-6/anf-6.kql b/docs/content/services/storage/azure-netapp-files/code/anf-6/anf-6.kql
index 1b470bde6..02fc86b4c 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-6/anf-6.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-6/anf-6.kql
@@ -1,10 +1,8 @@
-// This Resource Graph query will return all NetApp Volumes without Cross-Zone Replication.
+// This Resource Graph query will return all Azure NetApp Files volumes without cross-region replication.
resources
-| where type =~ "microsoft.netapp/netappaccounts/capacitypools/volumes"
-| extend NetAC0 = tostring(split(name,'/')[0])
-| join kind=leftouter (resources
- | where type =~ "microsoft.netapp/netappaccounts/capacitypools/volumes"
- | extend NetAC1 = tostring(split(name,'/')[0])
- | project id,NetAC1,remid=tostring(properties.dataProtection.replication.remoteVolumeResourceId)) on $left.id == $right.remid
-| where properties.volumeType != 'DataProtection' and NetAC0 != NetAC1
+| where type == "microsoft.netapp/netappaccounts/capacitypools/volumes"
+| extend remoteVolumeRegion = properties.dataProtection.replication.remoteVolumeRegion
+| extend volumeType = properties.volumeType
+| extend replicationType = iff((remoteVolumeRegion == location), "CZR", iff((remoteVolumeRegion == ""),"n/a","CRR"))
+| where replicationType != "CRR" and volumeType != "DataProtection"
| project recommendationId = "ANF-6", name, id, tags
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-7/anf-7.kql b/docs/content/services/storage/azure-netapp-files/code/anf-7/anf-7.kql
index 8f0edb91a..d49eae313 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-7/anf-7.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-7/anf-7.kql
@@ -1 +1,8 @@
-// cannot-be-validated-with-arg. The validation for this recommendation cannot be achieved with an Azure Resource Graph query.
+// This Resource Graph query will return all Azure NetApp Files volumes without cross-zone replication.
+resources
+| where type == "microsoft.netapp/netappaccounts/capacitypools/volumes"
+| extend remoteVolumeRegion = properties.dataProtection.replication.remoteVolumeRegion
+| extend volumeType = properties.volumeType
+| extend replicationType = iff((remoteVolumeRegion == location), "CZR", iff((remoteVolumeRegion == ""),"n/a","CRR"))
+| where replicationType != "CZR" and volumeType != "DataProtection"
+| project recommendationId = "ANF-7", name, id, tags
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-8/anf-8.kql b/docs/content/services/storage/azure-netapp-files/code/anf-8/anf-8.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/storage/azure-netapp-files/code/anf-8/anf-8.kql
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-8/anf-8.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/azure-netapp-files/code/anf-9/anf-9.kql b/docs/content/services/storage/azure-netapp-files/code/anf-9/anf-9.kql
new file mode 100644
index 000000000..fa5cad258
--- /dev/null
+++ b/docs/content/services/storage/azure-netapp-files/code/anf-9/anf-9.kql
@@ -0,0 +1 @@
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/storage-Account/_index.md b/docs/content/services/storage/storage-Account/_index.md
index 40bc48776..312d08c23 100644
--- a/docs/content/services/storage/storage-Account/_index.md
+++ b/docs/content/services/storage/storage-Account/_index.md
@@ -1,5 +1,5 @@
+++
-title = "Storage Account"
+title = "Storage Accounts (Blob/Azure Data Lake Storage Gen2)"
description = "Best practices and resiliency recommendations for Storage Account and associated resources."
date = "4/13/23"
author = "dost"
@@ -14,16 +14,16 @@ The presented resiliency recommendations in this guidance include Storage Accoun
The below table shows the list of resiliency recommendations for Storage Account and associated resources.
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG/Script Available|
-| :---------------------------------------------------------------------------------------------------------------------------------------------------- | :-----: | :-----: | :-----------------: |
-|[ST-1 - Ensure that storage account is redundant](#st-1---ensure-that-storage-account-is-redundant) | High | Preview | No |
-|[ST-2 - Do not use classic storage account](#st-2---do-not-use-classic-storage-account) | High | Preview | Yes |
-|[ST-3 - Ensure Performance tier is set as per workload](#st-3---ensure-performance-tier-is-set-as-per-workload) | Medium | Preview | Yes |
-|[ST-4 - Choose right storage account kind for workload](#st-4---choose-right-storage-account-kind-for-workload) | Medium | Preview | No |
-|[ST-5 - Enable soft delete for recovery of data](#st-5---enable-soft-delete-for-recovery-of-data) | Medium | Preview | No |
-|[ST-6 - Enable version for accidental modification and keep the number of versions below 1000](#st-6---enable-version-for-accidental-modification-and-keep-the-number-of-versions-below-1000) | Medium | Preview | No |
-|[ST-7 - Enable point and time restore for containers for recovery](#st-7---enable-point-and-time-restore-for-containers-for-recovery) | Low | Preview | No |
-|[ST-8 - Configure Diagnostic Settings for all storage accounts](#st-8---configure-diagnostic-settings-for-all-storage-accounts) | Low | Preview | No |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:--------------------:|
+| [ST-1 - Ensure that storage accounts are zone or region redundant](#st-1---ensure-that-storage-accounts-are-zone-or-region-redundant) | Availability | High | Verified | Yes |
+| [ST-2 - Do not use classic storage accounts](#st-2---do-not-use-classic-storage-accounts) | Governance | High | Verified | Yes |
+| [ST-3 - Ensure performance tier is set as per workload](#st-3---ensure-performance-tier-is-set-as-per-workload) | System Efficiency | Medium | Verified | No |
+| [ST-5 - Enable soft delete for recovery of data](#st-5---enable-soft-delete-for-recovery-of-data) | Disaster Recovery | Medium | Verified | No |
+| [ST-6 - Enable versioning for accidental modification and keep the number of versions below 1000](#st-6---enable-versioning-for-accidental-modification-and-keep-the-number-of-versions-below-1000) | Disaster Recovery | Low | Verified | No |
+| [ST-7 - Consider enabling point-in-time restore for standard general purpose v2 accounts with flat namespace](#st-7---consider-enabling-point-in-time-restore-for-standard-general-purpose-v2-accounts-with-flat-namespace) | Disaster Recovery | Low | Verified | No |
+| [ST-8 - Monitor all blob storage accounts](#st-8---monitor-all-blob-storage-accounts) | Monitoring | Low | Verified | No |
+| [ST-9 - Consider upgrading legacy storage accounts to v2 storage accounts](#st-9---consider-upgrading-legacy-storage-accounts-to-v2-storage-accounts) | System Efficiency | Low | Verified | Yes |
{{< /table >}}
@@ -35,7 +35,7 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
## Recommendations Details
-### ST-1 - Ensure that Storage Account is redundant
+### ST-1 - Ensure that storage accounts are zone or region redundant
**Category: Availability**
@@ -43,18 +43,16 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
**Guidance**
-Data in an Azure Storage account is always replicated three times in the primary region. Azure Storage offers other options for how your data is replicated in the primary or paired region:
+Redundancy ensures that your storage account meets its availability and durability targets even in the face of failures. When deciding which redundancy option is best for your scenario, consider the tradeoffs between lower costs and higher availability.
+Locally redundant storage (LRS) is the lowest-cost redundancy option and offers the least durability compared to other options. Microsoft recommends using zone-redundant storage (ZRS), geo-redundant storage (GRS), or geo-zone-redundant storage (GZRS) to ensure your storage accounts are available if an availability zone or region becomes unavailable.
-- LRS synchronously replicates data 3 times in single physical location. It is least expensive replication but not recommended for apps with high availability and durability. LRS provides eleven 9 durability.
-- ZRS copies data synchronously across 3 availability zone in primary region. ZRS is recommended for apps requiring high availability across zones. ZRS provides twelve 9s durability.
-- GRS replicate additional 3 copies to secondary region and provides sixteen 9s availability.
-- GZRS provides both high availability and redundancy across geo replication. It provides sixteen 9s durability over a given year.
**Resources**
- [Azure Storage redundancy](https://learn.microsoft.com/azure/storage/common/storage-redundancy)
+- [Change the redundancy configuration for a storage account](https://learn.microsoft.com/azure/storage/common/redundancy-migration)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -64,7 +62,7 @@ Data in an Azure Storage account is always replicated three times in the primary
-### ST-2 - Do not use classic Storage Account
+### ST-2 - Do not use classic storage accounts
**Category: Governance**
@@ -72,13 +70,14 @@ Data in an Azure Storage account is always replicated three times in the primary
**Guidance**
-Azure classic Storage Account will retire 31 august 2024. So migrate all workload from classic storage to v2.
+Classic storage accounts will be fully retired on August 31, 2024. If you have classic storage accounts, start planning your migration now.
**Resources**
-- [storage account retirement announcement](https://azure.microsoft.com/updates/classic-azure-storage-accounts-will-be-retired-on-31-august-2024/)
+- [Azure classic storage accounts retirement announcement](https://azure.microsoft.com/updates/classic-azure-storage-accounts-will-be-retired-on-31-august-2024/)
+- [Migrate your classic storage accounts to Azure Resource Manager](https://learn.microsoft.com/azure/storage/common/classic-account-migration-overview)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -96,13 +95,19 @@ Azure classic Storage Account will retire 31 august 2024. So migrate all workloa
**Guidance**
-Consider using appropriate storage performance tier for standard storage / block blob / append blob / file-share and page blob. Each workload scenario requires appropriate Performance tier and its important that based on the type of transaction and blob type/file type appropriate performance tier is selected. Failing to do so will create performance bottleneck.
+Consider using appropriate storage performance tier for workload scenarios. Each workload scenario requires appropriate performance tiers and it's important that appropriate performance tiers are selected based on the storage usage.
**Resources**
-- [Performance Tier](https://learn.microsoft.com/azure/storage/common/storage-account-overview#performance-tiers )
+- [Types of storage accounts](https://learn.microsoft.com/azure/storage/common/storage-account-overview#types-of-storage-accounts)
+- [Scalability and performance targets for standard storage accounts](https://learn.microsoft.com/azure/storage/common/scalability-targets-standard-account)
+- [Performance and scalability checklist for Blob storage](https://learn.microsoft.com/azure/storage/blobs/storage-performance-checklist)
+- [Scalability and performance targets for Blob storage](https://learn.microsoft.com/azure/storage/blobs/scalability-targets)
+- [Premium block blob storage accounts](https://learn.microsoft.com/azure/storage/blobs/storage-blob-block-blob-premium)
+- [Scalability targets for premium block blob storage accounts](https://learn.microsoft.com/azure/storage/blobs/scalability-targets-premium-block-blobs)
+- [Scalability and performance targets for premium page blob storage accounts](https://learn.microsoft.com/azure/storage/blobs/scalability-targets-premium-page-blobs)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -112,123 +117,129 @@ Consider using appropriate storage performance tier for standard storage / block
-### ST-4 - Choose right storage account kind for workload
+### ST-5 - Enable soft delete for recovery of data
-**Category: System Efficiency**
+**Category: Disaster Recovery**
**Impact: Medium**
**Guidance**
-Block blobs are optimized for uploading large amounts of data efficiently. Block blobs are composed of blocks, each of which is identified by a block ID. A block blob can include up to 50,000 blocks
+Soft delete option allow for recovering data if its deleted by mistaken. Moreover Lock will prevent accidentally deleting storage account.
**Resources**
-- [Storage Account Kind docs](https://learn.microsoft.com/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs )
+- [Soft delete detail docs](https://learn.microsoft.com//azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal )
-**Resource Graph Query/Scripts**
+**Script**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="sql" file="code/st-4/st-4.kql" >}} {{< /code >}}
+{{< code lang="sql" file="code/st-5/st-5.kql" >}} {{< /code >}}
{{< /collapse >}}
-### ST-5 - Enable soft delete for recovery of data
+### ST-6 - Enable versioning for accidental modification and keep the number of versions below 1000
**Category: Disaster Recovery**
-**Impact: Medium**
+**Impact: Low**
**Guidance**
-Soft delete option allow for recovering data if its deleted by mistaken. Moreover Lock will prevent accidentally deleting storage account.
+Consider enabling versioning to recover data from accidental modification or deletion.
+Having a large number of versions per blob can increase the latency for blob listing operations. Microsoft recommends maintaining fewer than 1000 versions per blob. You can use lifecycle management to automatically delete old versions.
**Resources**
-- [Soft delete detail docs](https://learn.microsoft.com//azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal )
+- [Blob versioning](https://learn.microsoft.com/azure/storage/blobs/versioning-overview )
**Script**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="powershell" file="/code/st-5/st-5.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/st-6/st-6.kql" >}} {{< /code >}}
{{< /collapse >}}
-### ST-6 - Enable version for accidental modification and keep the number of versions below 1000
+### ST-7 - Consider enabling point-in-time restore for standard general purpose v2 accounts with flat namespace
**Category: Disaster Recovery**
-**Impact: Medium**
+**Impact: Low**
**Guidance**
-To recover data from accidental modification or deletion enable versioning.
-Having a large number of versions per blob can increase the latency for blob listing operations. Microsoft recommends maintaining fewer than 1000 versions per blob. You can use lifecycle management to automatically delete old versions.
+Consider enabling point-in-time restore for standard general purpose v2 accounts with flat namespace. Point-in-time restore provides protection against accidental deletion or corruption by enabling you to restore block blob data to an earlier state.
**Resources**
-- [Blob versioning](https://learn.microsoft.com/azure/storage/blobs/versioning-overview )
+- [Point-in-time restore for block blobs](https://learn.microsoft.com/azure/storage/blobs/point-in-time-restore-overview)
+- [Perform a point-in-time restore on block blob data](https://learn.microsoft.com/azure/storage/blobs/point-in-time-restore-manage?tabs=portal)
**Script**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="powershell" file="/code/st-6/st-6.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/st-7/st-7.kql" >}} {{< /code >}}
{{< /collapse >}}
-### ST-7 - Enable point and time restore for containers for recovery
+### ST-8 - Monitor all blob storage accounts
-**Category: Disaster Recovery**
+**Category: Monitoring**
**Impact: Low**
**Guidance**
-You can use point-in-time restore to restore one or more sets of block blobs to a previous state
-Point and time restore support general purpose v2 account in standard performance tier. Its a mechanism to protect data
+When you have critical applications and business processes that rely on Azure resources, you need to monitor and get alerts for your system.
+Resource logs aren't collected and stored until you create a diagnostic setting and route the logs to one or more locations. When you create a diagnostic setting, you specify which categories of logs to collect.
**Resources**
-- [Restore overview](https://learn.microsoft.com/azure/storage/blobs/point-in-time-restore-manage?tabs=portal)
+- [Monitor Azure Blob Storage](https://learn.microsoft.com/azure/storage/blobs/monitor-blob-storage)
+- [Best practices for monitoring Azure Blob Storage](https://learn.microsoft.com/azure/storage/blobs/blob-storage-monitoring-scenarios)
**Script**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="powershell" file="/code/st-7/st-7.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/st-8/st-8.kql" >}} {{< /code >}}
{{< /collapse >}}
-### ST-8 - Configure Diagnostic Settings for all storage accounts
+### ST-9 - Consider upgrading legacy storage accounts to v2 storage accounts
-**Category: Monitoring**
+**Category: System Efficiency**
**Impact: Low**
**Guidance**
-Enabling diagnostic settings allow you to capture and view diagnostic information so that you can troubleshoot any failures.
+General-purpose v2 accounts are recommended for most storage scenarios with the latest features or the lowest per-gigabyte pricing. Legacy account types (Standard general-purpose v1 and Blob Storage) aren’t recommended by Microsoft, but may be used in certain scenarios.
+Please consider the scenarios (classic compatibility, transaction-intensive, etc.) listed in the documentation and upgrade legacy storage accounts to v2 storage accounts when applicable.
+
+Upgrading to a general-purpose v2 storage account from your general-purpose v1 or Blob storage accounts is straightforward. There's no downtime or risk of data loss associated with upgrading to a general-purpose v2 storage account. Upgrading a general-purpose v1 or Blob storage account to general-purpose v2 is permanent and cannot be undone.
**Resources**
-- [Diagnostic Setting for Storage Account](https://learn.microsoft.com/en-us/azure/storage/blobs/monitor-blob-storage)
+- [Legacy storage account types](https://learn.microsoft.com/azure/storage/common/storage-account-overview#legacy-storage-account-types)
+- [Upgrade to a general-purpose v2 storage account](https://learn.microsoft.com/azure/storage/common/storage-account-upgrade)
**Script**
{{< collapse title="Show/Hide Query/Script" >}}
-{{< code lang="powershell" file="/code/st-8/st-8.ps1" >}} {{< /code >}}
+{{< code lang="sql" file="code/st-9/st-9.kql" >}} {{< /code >}}
{{< /collapse >}}
diff --git a/docs/content/services/storage/storage-Account/code/st-1/st-1.kql b/docs/content/services/storage/storage-Account/code/st-1/st-1.kql
index 614a7f9ca..b41c22efb 100644
--- a/docs/content/services/storage/storage-Account/code/st-1/st-1.kql
+++ b/docs/content/services/storage/storage-Account/code/st-1/st-1.kql
@@ -1 +1,6 @@
-// under-development
+// Azure Resource Graph Query
+// This query will return all storage accounts that are not using Zone or Region replication
+Resources
+| where type =~ "Microsoft.Storage/storageAccounts"
+| where sku.name in~ ("Standard_LRS", "Premium_LRS")
+| project recommendationId = "st-1", name, id, tags, param1 = strcat("sku: ", sku.name)
diff --git a/docs/content/services/storage/storage-Account/code/st-1/st-1.kql.fix b/docs/content/services/storage/storage-Account/code/st-1/st-1.kql.fix
deleted file mode 100644
index 4219615bf..000000000
--- a/docs/content/services/storage/storage-Account/code/st-1/st-1.kql.fix
+++ /dev/null
@@ -1,5 +0,0 @@
-// Azure Resource Graph Query
-// This query will return all storage accounts that are not using at least Zone replication
-Resources | where type =~'microsoft.storage/storageaccounts'
-| where sku.name =~'Standard_LRS' or sku.name =~ 'Standard_ZRS'
-| project recommendationId = 'st-1', name, id, param1=sku.name
diff --git a/docs/content/services/storage/storage-Account/code/st-2/st-2.kql b/docs/content/services/storage/storage-Account/code/st-2/st-2.kql
index 27bf1b7c7..64f8339eb 100644
--- a/docs/content/services/storage/storage-Account/code/st-2/st-2.kql
+++ b/docs/content/services/storage/storage-Account/code/st-2/st-2.kql
@@ -2,4 +2,4 @@
// Find all Azure classic Storage Account
resources
| where type =~ 'microsoft.classicstorage/storageaccounts'
-| project recommendationId = 'st-2', name, id, param1=type
+| project recommendationId = 'st-2', name, id, tags, param1=type
diff --git a/docs/content/services/storage/storage-Account/code/st-3/st-3.kql b/docs/content/services/storage/storage-Account/code/st-3/st-3.kql
index 9c4aac273..fa5cad258 100644
--- a/docs/content/services/storage/storage-Account/code/st-3/st-3.kql
+++ b/docs/content/services/storage/storage-Account/code/st-3/st-3.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Find all Azure Storage Accounts, that do not have an access tier set
-resources
-| where type =~'microsoft.storage/storageaccounts'
-| where isnull(properties.accessTier)
-| project recommendationId = 'st-3', name, id, param1="not defined - GeneralPurpose V1"
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/storage-Account/code/st-4/st-4.kql b/docs/content/services/storage/storage-Account/code/st-4/st-4.kql
index 614a7f9ca..fa5cad258 100644
--- a/docs/content/services/storage/storage-Account/code/st-4/st-4.kql
+++ b/docs/content/services/storage/storage-Account/code/st-4/st-4.kql
@@ -1 +1 @@
-// under-development
+// cannot-be-validated-with-arg
diff --git a/docs/content/services/storage/storage-Account/code/st-5/st-5.kql b/docs/content/services/storage/storage-Account/code/st-5/st-5.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/storage/storage-Account/code/st-5/st-5.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/storage/storage-Account/code/st-6/st-6.kql b/docs/content/services/storage/storage-Account/code/st-6/st-6.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/storage/storage-Account/code/st-6/st-6.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/storage/storage-Account/code/st-7/st-7.kql b/docs/content/services/storage/storage-Account/code/st-7/st-7.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/storage/storage-Account/code/st-7/st-7.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/storage/storage-Account/code/st-8/st-8.kql b/docs/content/services/storage/storage-Account/code/st-8/st-8.kql
new file mode 100644
index 000000000..614a7f9ca
--- /dev/null
+++ b/docs/content/services/storage/storage-Account/code/st-8/st-8.kql
@@ -0,0 +1 @@
+// under-development
diff --git a/docs/content/services/storage/storage-Account/code/st-9/st-9.kql b/docs/content/services/storage/storage-Account/code/st-9/st-9.kql
new file mode 100644
index 000000000..04bc1856e
--- /dev/null
+++ b/docs/content/services/storage/storage-Account/code/st-9/st-9.kql
@@ -0,0 +1,9 @@
+// Azure Resource Graph Query
+// Find all Azure Storage Accounts, that upgradeable to General purpose v2.
+Resources
+| where type =~ "Microsoft.Storage/storageAccounts" and kind in~ ("Storage", "BlobStorage")
+| extend
+ param1 = strcat("AccountKind: ", case(kind =~ "Storage", "Storage (general purpose v1)", kind =~ "BlobStorage", "BlobStorage", kind)),
+ param2 = strcat("Performance: ", sku.tier),
+ param3 = strcat("Replication: ", sku.name)
+| project recommendationId = "st-9", name, id, tags, param1, param2, param3
diff --git a/docs/content/services/web/app-service-plan/_index.md b/docs/content/services/web/app-service-plan/_index.md
index 59846855c..8efd7c000 100644
--- a/docs/content/services/web/app-service-plan/_index.md
+++ b/docs/content/services/web/app-service-plan/_index.md
@@ -12,13 +12,13 @@ The presented resiliency recommendations in this guidance include App Service Pl
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [ASP-1 - Migrate App Service to availability Zone Support](#asp-1---migrate-app-service-to-availability-zone-support) | High | Preview | Yes |
-| [ASP-2 - Use Standard or Premium tier](#asp-2---use-standard-or-premium-tier) | High | Preview | Yes |
-| [ASP-3 - Avoid scaling up or down](#asp-3---avoid-scaling-up-or-down) | Medium | Preview | Yes |
-| [ASP-4 - Create separate App Service plans for production and test](#asp-4---create-separate-app-service-plans-for-production-and-test) | High | Preview | No |
-| [ASP-5 - Enable Autoscale/Automatic scaling to ensure adequate resources are available to service requests](#asp-5---enable-autoscaleautomatic-scaling-to-ensure-adequate-resources-are-available-to-service-requests) | Medium | Preview | Yes |
+| Recommendation | Category | Impact | State | ARG Query Available |
+|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------:|:------:|:-------:|:-------------------:|
+| [ASP-1 - Migrate App Service to availability Zone Support](#asp-1---migrate-app-service-to-availability-zone-support) | Availability | High | Preview | Yes |
+| [ASP-2 - Use Standard or Premium tier](#asp-2---use-standard-or-premium-tier) | Availability | High | Preview | Yes |
+| [ASP-3 - Avoid scaling up or down](#asp-3---avoid-scaling-up-or-down) | System Efficiency | Medium | Preview | Yes |
+| [ASP-4 - Create separate App Service plans for production and test](#asp-4---create-separate-app-service-plans-for-production-and-test) | Governance | High | Preview | No |
+| [ASP-5 - Enable Autoscale/Automatic scaling to ensure adequate resources are available to service requests](#asp-5---enable-autoscaleautomatic-scaling-to-ensure-adequate-resources-are-available-to-service-requests) | System Efficiency | Medium | Preview | Yes |
{{< /table >}}
{{< alert style="info" >}}
@@ -30,19 +30,20 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### ASP-1 - Migrate App Service to availability Zone Support
+**Category: Availability**
+
**Impact: High**
**Guidance**
Deploying your App Service plans and App Service Environments across availability zones (AZ) is a feature provided by Azure to enhance the resiliency and reliability of your business-critical workloads. By distributing your applications across multiple availability zones, you can ensure their continued operation even in the event of a datacenter-level failure. This approach offers excellent redundancy without the need for deploying your applications in different Azure regions. Availability zones provide a higher level of fault tolerance, helping to safeguard your applications and minimize downtime. This enables your business to maintain continuity and deliver uninterrupted services to your customers.
-
**Resources**
- [Migrate App Service to availability zone support](https://learn.microsoft.com/en-us/azure/reliability/migrate-app-service)
- [High availability enterprise deployment using App Service Environment](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/enterprise-integration/ase-high-availability-deployment)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -54,6 +55,8 @@ Deploying your App Service plans and App Service Environments across availabilit
### ASP-2 - Use Standard or Premium tier
+**Category: Availability**
+
**Impact: High**
**Guidance**
@@ -64,7 +67,7 @@ The use of the Standard or Premium tier for Azure App Service Plan is crucial fo
- [Resiliency checklist for specific Azure services](https://learn.microsoft.com/en-us/azure/architecture/checklist/resiliency-per-service#app-service)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -76,6 +79,8 @@ The use of the Standard or Premium tier for Azure App Service Plan is crucial fo
### ASP-3 - Avoid scaling up or down
+**Category: System Efficiency**
+
**Impact: Medium**
**Guidance**
@@ -86,7 +91,7 @@ It is recommended to avoid scaling up or down your Azure App Service instances f
- [Resiliency checklist for specific Azure services](https://learn.microsoft.com/en-us/azure/architecture/checklist/resiliency-per-service#app-service)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -98,6 +103,7 @@ It is recommended to avoid scaling up or down your Azure App Service instances f
### ASP-4 - Create separate App Service plans for production and test
+**Category: Governance**
**Impact: High**
@@ -109,7 +115,7 @@ It is strongly recommended to create separate App Service plans for production a
- [Resiliency checklist for specific Azure services](https://learn.microsoft.com/en-us/azure/architecture/checklist/resiliency-per-service#app-service)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -121,6 +127,8 @@ It is strongly recommended to create separate App Service plans for production a
### ASP-5 - Enable Autoscale/Automatic scaling to ensure adequate resources are available to service requests
+**Category: System Efficiency**
+
**Impact: Medium**
**Guidance**
@@ -132,7 +140,7 @@ It is highly recommended to enable Autoscale/Automatic Scaling for your Azure Ap
- [Automatic scaling in Azure App Service](https://learn.microsoft.com/en-us/azure/app-service/manage-automatic-scaling?tabs=azure-portal)
- [Auto Scale Web Apps](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-get-started)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/web/app-service-plan/code/asp-1/asp-1.kql b/docs/content/services/web/app-service-plan/code/asp-1/asp-1.kql
index 2cc30b237..fc1913087 100644
--- a/docs/content/services/web/app-service-plan/code/asp-1/asp-1.kql
+++ b/docs/content/services/web/app-service-plan/code/asp-1/asp-1.kql
@@ -7,4 +7,4 @@ resources
| extend zoneRedundant = tobool(properties.zoneRedundant)
| extend sku_tier = tostring(sku.tier)
| where (tolower(sku_tier) contains "isolated" or tolower(sku_tier) contains "premium") and zoneRedundant == false
-| project recommendationid="asp-1", name, id, sku_tier, zoneRedundant
+| project recommendationid="asp-1", name, id, tags, param1=sku_tier, param2="Not Zone Redundant"
diff --git a/docs/content/services/web/app-service-plan/code/asp-2/asp-2.kql b/docs/content/services/web/app-service-plan/code/asp-2/asp-2.kql
index 90ba52e30..2adc39741 100644
--- a/docs/content/services/web/app-service-plan/code/asp-2/asp-2.kql
+++ b/docs/content/services/web/app-service-plan/code/asp-2/asp-2.kql
@@ -1,5 +1,5 @@
// Azure Resource Graph Query
-// Provides a list of Azure App Service Plans that are not in the “Standard”, “Premium”, or “IsolatedV2” SKU tiers.
+// Provides a list of Azure App Service Plans that are not in the "Standard", "Premium", or "IsolatedV2" SKU tiers.
resources
| where type =~ 'microsoft.web/serverfarms'
@@ -7,5 +7,4 @@ resources
| where tolower(sku_tier) !contains "standard" and
tolower(sku_tier) !contains "premium" and
tolower(sku_tier) !contains "isolatedv2"
-| project recommendationid="asp-2", name, id, sku_tier
-
+| project recommendationid="asp-2", name, id, tags, param1= strcat("SKU=",sku_tier)
diff --git a/docs/content/services/web/app-service-plan/code/asp-3/asp-3.kql b/docs/content/services/web/app-service-plan/code/asp-3/asp-3.kql
index 6e22649fa..c228c1848 100644
--- a/docs/content/services/web/app-service-plan/code/asp-3/asp-3.kql
+++ b/docs/content/services/web/app-service-plan/code/asp-3/asp-3.kql
@@ -10,4 +10,4 @@ changedProperties = properties.changes, changeCount = properties.changeAttribute
| where resources_Type contains "microsoft.web/serverfarms"
| where changedProperties['sku.name'].propertyChangeType == 'Update' or changedProperties['sku.tier'].propertyChangeType == 'Update'
| summarize count() by targetResourceId, resources_Name ,tostring(changedProperties['sku.name'].previousValue), tostring(changedProperties['sku.tier'].newValue)
-| project recommendationid="asp-3", ["id"]=targetResourceId, resources_Name, ['changedProperties_sku.name_previousValue'], ['changedProperties_sku.tier_newValue'], count_
+| project recommendationid="asp-3", name=resources_Name, id=targetResourceId, tags="", param1=['changedProperties_sku.name_previousValue'], param2=['changedProperties_sku.tier_newValue'], param3=count_
diff --git a/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql b/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql
index d13802ae9..614a7f9ca 100644
--- a/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql
+++ b/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql
@@ -1,16 +1 @@
-// Azure Resource Graph Query
-//// Provides a list of Azure App Service Plans that are in the “PremiumV2”, “PremiumV3”, “Premium0V3”, “PremiumMV3”, or “Standard” tier, and checks if they have Elastic Scale or Autoscale enabled.
-
- resources
- | where type =~ 'microsoft.web/serverfarms'
- | extend tier = sku.tier, elasticScaleEnabled = coalesce(properties.elasticScaleEnabled, false)
- | where tier in ('PremiumV2', 'PremiumV3', 'Premium0V3', 'PremiumMV3', 'Standard')
- | extend id = tostring(id)
- | project id, name, tier, ['Elastic Scale'] = iff(elasticScaleEnabled, 'Enabled', 'Disabled')
- | join kind=leftouter (
- resources
- | where type =~ 'microsoft.insights/autoscalesettings'
- | extend autoscaleEnabled = coalesce(properties.enabled, false), metricResourceUri = tostring(properties.profiles[0].rules[0].metricTrigger.metricResourceUri)
- | project autoscaleEnabled, metricResourceUri
- ) on $left.id == $right.metricResourceUri
- | project recommendationid="asp-5",name, id, ['Tier'] = tier, ['Elastic Scale'] = ['Elastic Scale'], ['Autoscale'] = iff(isnull(autoscaleEnabled), 'Disabled', iff(autoscaleEnabled, 'Enabled', 'Disabled'))
+// under-development
diff --git a/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql.fix b/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql.fix
new file mode 100644
index 000000000..d13802ae9
--- /dev/null
+++ b/docs/content/services/web/app-service-plan/code/asp-5/asp-5.kql.fix
@@ -0,0 +1,16 @@
+// Azure Resource Graph Query
+//// Provides a list of Azure App Service Plans that are in the “PremiumV2”, “PremiumV3”, “Premium0V3”, “PremiumMV3”, or “Standard” tier, and checks if they have Elastic Scale or Autoscale enabled.
+
+ resources
+ | where type =~ 'microsoft.web/serverfarms'
+ | extend tier = sku.tier, elasticScaleEnabled = coalesce(properties.elasticScaleEnabled, false)
+ | where tier in ('PremiumV2', 'PremiumV3', 'Premium0V3', 'PremiumMV3', 'Standard')
+ | extend id = tostring(id)
+ | project id, name, tier, ['Elastic Scale'] = iff(elasticScaleEnabled, 'Enabled', 'Disabled')
+ | join kind=leftouter (
+ resources
+ | where type =~ 'microsoft.insights/autoscalesettings'
+ | extend autoscaleEnabled = coalesce(properties.enabled, false), metricResourceUri = tostring(properties.profiles[0].rules[0].metricTrigger.metricResourceUri)
+ | project autoscaleEnabled, metricResourceUri
+ ) on $left.id == $right.metricResourceUri
+ | project recommendationid="asp-5",name, id, ['Tier'] = tier, ['Elastic Scale'] = ['Elastic Scale'], ['Autoscale'] = iff(isnull(autoscaleEnabled), 'Disabled', iff(autoscaleEnabled, 'Enabled', 'Disabled'))
diff --git a/docs/content/services/web/signalr/_index.md b/docs/content/services/web/signalr/_index.md
index b63b654f6..7cbd0ecb8 100644
--- a/docs/content/services/web/signalr/_index.md
+++ b/docs/content/services/web/signalr/_index.md
@@ -39,7 +39,7 @@ Use SignalR with zone redundancy for production workloads. Zone redundancy is a
- [Availability zones support in Azure SignalR Service](https://learn.microsoft.com/azure/azure-signalr/availability-zones)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
diff --git a/docs/content/services/web/signalr/code/sigr-1/sigr-1.kql b/docs/content/services/web/signalr/code/sigr-1/sigr-1.kql
index 5e283c108..1dd3b38bc 100644
--- a/docs/content/services/web/signalr/code/sigr-1/sigr-1.kql
+++ b/docs/content/services/web/signalr/code/sigr-1/sigr-1.kql
@@ -3,5 +3,5 @@
resources
| where type == "microsoft.signalrservice/signalr"
| where sku.tier != "Premium"
-| project recommendationId = "sigr-1", name, id, param1 = "AvailabilityZones: Single Zone"
+| project recommendationId = "sigr-1", name, id, tags, param1 = "AvailabilityZones: Single Zone"
| order by id asc
diff --git a/docs/content/services/web/web-app/_index.md b/docs/content/services/web/web-app/_index.md
index 41e6807dc..87ee0c5a0 100644
--- a/docs/content/services/web/web-app/_index.md
+++ b/docs/content/services/web/web-app/_index.md
@@ -12,14 +12,14 @@ The presented resiliency recommendations in this guidance include Web App and as
## Summary of Recommendations
{{< table style="table-striped" >}}
-| Recommendation | Impact | State | ARG Query Available |
-| :------------------------------------------------ | :------: | :------: | :-----------------: |
-| [APP-1 - Enable diagnostics logging](#app-1---enable-diagnostics-logging) | High | Preview | Yes |
-| [APP-2 - Monitor performance](#app-2---monitor-performance) | Medium | Preview | Yes |
-| [APP-3 - Separate web apps from web APIs](#app-3---separate-web-apps-from-web-apis) | Low | Preview | No |
-| [APP-4 - Create a separate storage account for logs](#app-4---create-a-separate-storage-account-for-logs) | Medium | Preview | No |
-| [APP-5 - Deploy to a staging slot](#app-5---deploy-to-a-staging-slot) | Medium | Preview | Yes |
-| [APP-6 - Store configuration as app settings](#app-6---store-configuration-as-app-settings) | Medium | Preview | Yes |
+| Recommendation |Category| Impact | State | ARG Query Available |
+|:----------------------------------------------------------------------------------------------------------|:-:|:------:|:-------:|:-------------------:|
+| [APP-1 - Enable diagnostics logging](#app-1---enable-diagnostics-logging) |Monitoring| Low | Preview | Yes |
+| [APP-2 - Monitor performance](#app-2---monitor-performance) |Monitoring| Medium | Preview | Yes |
+| [APP-3 - Separate web apps from web APIs](#app-3---separate-web-apps-from-web-apis) |System Efficiency| Low | Preview | No |
+| [APP-4 - Create a separate storage account for logs](#app-4---create-a-separate-storage-account-for-logs) |System Efficiency| Medium | Preview | No |
+| [APP-5 - Deploy to a staging slot](#app-5---deploy-to-a-staging-slot) |Governance| Medium | Preview | Yes |
+| [APP-6 - Store configuration as app settings](#app-6---store-configuration-as-app-settings) |Application Resilience| Medium | Preview | Yes |
{{< /table >}}
@@ -33,7 +33,9 @@ Definitions of states can be found [here]({{< ref "../../../_index.md#definition
### App-1 - Enable diagnostics logging
-**Impact: High**
+**Category: Monitoring**
+
+**Impact: Low**
**Guidance**
@@ -53,6 +55,8 @@ Enabling diagnostics logging for your Azure App Service is important for monitor
### App-2 - Monitor Performance
+**Category: Monitoring**
+
**Impact: Medium**
**Guidance**
@@ -66,7 +70,7 @@ Enable monitoring on your web applications based on ASP.NET, ASP.NET Core, Java,
- [Application Insights](https://learn.microsoft.com/azure/application-insights/app-insights-overview)
- [Application monitoring for Azure App Service](https://learn.microsoft.com/azure/azure-monitor/app/azure-web-apps)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -78,6 +82,8 @@ Enable monitoring on your web applications based on ASP.NET, ASP.NET Core, Java,
### App-3 - Separate web apps from web APIs
+**Category: System Efficiency**
+
**Impact: Low**
**Guidance**
@@ -98,6 +104,8 @@ If your solution has both a web front end and a web API, consider decomposing th
### App-4 - Create a separate storage account for logs
+**Category: System Efficiency**
+
**Impact: Medium**
**Guidance**
@@ -118,6 +126,8 @@ Create a separate storage account for logs. Don't use the same storage account f
### App-5 - Deploy to a staging slot
+**Category: Governance**
+
**Impact: Medium**
**Guidance**
@@ -129,7 +139,7 @@ Consider creating a deployment slot to hold the last-known-good (LKG) deployment
- [Set up staging environments in Azure App Service](https://learn.microsoft.com/azure/app-service-web/web-sites-staged-publishing)
-**Resource Graph Query/Scripts**
+**Resource Graph Query**
{{< collapse title="Show/Hide Query/Script" >}}
@@ -141,6 +151,8 @@ Consider creating a deployment slot to hold the last-known-good (LKG) deployment
### App-6 - Store configuration as app settings
+**Category: Application Resilience**
+
**Impact: Medium**
**Guidance**
diff --git a/docs/content/services/web/web-app/code/app-1/app-1.kql b/docs/content/services/web/web-app/code/app-1/app-1.kql
index 3d816e97b..7b5bb5473 100644
--- a/docs/content/services/web/web-app/code/app-1/app-1.kql
+++ b/docs/content/services/web/web-app/code/app-1/app-1.kql
@@ -1,17 +1 @@
-// Azure Resource Graph Query
-// Provides a list of Azure App Service resources and checks if they have Diagnostic logs Categories turned on or now, list settings with on or off indicator
-
-appserviceresources
-| where ['type'] == "microsoft.web/sites/config"
-| where not(name contains "/slots/")
-| project id, properties.AzureMonitorLogCategories, name
-| extend AzureMonitorLogCategories = iif(isempty(properties_AzureMonitorLogCategories), false, true)
-, AppServiceHTTPLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceHTTPLogs"), 1, 0)
-, AppServiceConsoleLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceConsoleLogs"), 1, 0)
-, AppServiceAppLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceAppLogs"), 1, 0)
-, AppServiceFileAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceFileAuditLogs"), 1, 0)
-, AppServiceAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceAuditLogs"), 1, 0)
-, AppServiceIPSecAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceIPSecAuditLogs"), 1, 0)
-, AppServicePlatformLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServicePlatformLogs"), 1, 0)
-, AppServiceAntivirusScanAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceAntivirusScanAuditLogs"), 1, 0)
-| project recommendationId="app-1" , ['id'], ["Azure Monitor Logs"]= AzureMonitorLogCategories , AppServiceAntivirusScanAuditLogs, AppServiceAppLogs, AppServiceAuditLogs, AppServiceConsoleLogs, AppServiceFileAuditLogs, AppServiceHTTPLogs, AppServiceIPSecAuditLogs, AppServicePlatformLogs
+// under development
diff --git a/docs/content/services/web/web-app/code/app-1/app-1.kql.fix b/docs/content/services/web/web-app/code/app-1/app-1.kql.fix
new file mode 100644
index 000000000..3d816e97b
--- /dev/null
+++ b/docs/content/services/web/web-app/code/app-1/app-1.kql.fix
@@ -0,0 +1,17 @@
+// Azure Resource Graph Query
+// Provides a list of Azure App Service resources and checks if they have Diagnostic logs Categories turned on or now, list settings with on or off indicator
+
+appserviceresources
+| where ['type'] == "microsoft.web/sites/config"
+| where not(name contains "/slots/")
+| project id, properties.AzureMonitorLogCategories, name
+| extend AzureMonitorLogCategories = iif(isempty(properties_AzureMonitorLogCategories), false, true)
+, AppServiceHTTPLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceHTTPLogs"), 1, 0)
+, AppServiceConsoleLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceConsoleLogs"), 1, 0)
+, AppServiceAppLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceAppLogs"), 1, 0)
+, AppServiceFileAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceFileAuditLogs"), 1, 0)
+, AppServiceAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceAuditLogs"), 1, 0)
+, AppServiceIPSecAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceIPSecAuditLogs"), 1, 0)
+, AppServicePlatformLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServicePlatformLogs"), 1, 0)
+, AppServiceAntivirusScanAuditLogs = iif(set_has_element(properties_AzureMonitorLogCategories, "AppServiceAntivirusScanAuditLogs"), 1, 0)
+| project recommendationId="app-1" , ['id'], ["Azure Monitor Logs"]= AzureMonitorLogCategories , AppServiceAntivirusScanAuditLogs, AppServiceAppLogs, AppServiceAuditLogs, AppServiceConsoleLogs, AppServiceFileAuditLogs, AppServiceHTTPLogs, AppServiceIPSecAuditLogs, AppServicePlatformLogs
diff --git a/docs/content/services/web/web-app/code/app-2/app-2.kql b/docs/content/services/web/web-app/code/app-2/app-2.kql
index 88e5b21f3..614a7f9ca 100644
--- a/docs/content/services/web/web-app/code/app-2/app-2.kql
+++ b/docs/content/services/web/web-app/code/app-2/app-2.kql
@@ -1,13 +1 @@
-
-// Azure Resource Graph Query
-// Provides a list of Azure App Service resources and checks if they have Application Insights enabled by looking for the “APPINSIGHTS_INSTRUMENTATIONKEY” or “APPLICATIONINSIGHTS_CONNECTION_STRING” settings in their configuration.
-
-appserviceresources
-| where type == "microsoft.web/sites/config"
-| extend appSettings = properties.AppSettings
-| mv-expand appSettings
-| extend settingName = tostring(appSettings.Name)
-| extend isAppInsightsInstrumentationKey = iif(settingName == "APPINSIGHTS_INSTRUMENTATIONKEY", true, false)
-| extend isApplicationInsightsConnectionString = iif(settingName == "APPLICATIONINSIGHTS_CONNECTION_STRING", true, false)
-| extend isAppInsightsEnabled = iif(isAppInsightsInstrumentationKey or isApplicationInsightsConnectionString, true, false)
-| project recommendationId="app-2", name, id, settingName, isAppInsightsInstrumentationKey, isApplicationInsightsConnectionString, isAppInsightsEnabled
+// under-development
diff --git a/docs/content/services/web/web-app/code/app-2/app-2.kql.fix b/docs/content/services/web/web-app/code/app-2/app-2.kql.fix
new file mode 100644
index 000000000..88e5b21f3
--- /dev/null
+++ b/docs/content/services/web/web-app/code/app-2/app-2.kql.fix
@@ -0,0 +1,13 @@
+
+// Azure Resource Graph Query
+// Provides a list of Azure App Service resources and checks if they have Application Insights enabled by looking for the “APPINSIGHTS_INSTRUMENTATIONKEY” or “APPLICATIONINSIGHTS_CONNECTION_STRING” settings in their configuration.
+
+appserviceresources
+| where type == "microsoft.web/sites/config"
+| extend appSettings = properties.AppSettings
+| mv-expand appSettings
+| extend settingName = tostring(appSettings.Name)
+| extend isAppInsightsInstrumentationKey = iif(settingName == "APPINSIGHTS_INSTRUMENTATIONKEY", true, false)
+| extend isApplicationInsightsConnectionString = iif(settingName == "APPLICATIONINSIGHTS_CONNECTION_STRING", true, false)
+| extend isAppInsightsEnabled = iif(isAppInsightsInstrumentationKey or isApplicationInsightsConnectionString, true, false)
+| project recommendationId="app-2", name, id, settingName, isAppInsightsInstrumentationKey, isApplicationInsightsConnectionString, isAppInsightsEnabled
diff --git a/docs/content/services/web/web-app/code/app-5/app-5.kql b/docs/content/services/web/web-app/code/app-5/app-5.kql
index 617c9a363..1ef3d2c64 100644
--- a/docs/content/services/web/web-app/code/app-5/app-5.kql
+++ b/docs/content/services/web/web-app/code/app-5/app-5.kql
@@ -9,6 +9,6 @@ resources
| where tolower(Sku) contains "standard" or tolower(Sku) contains "premium" or tolower(Sku) contains "isolatedv2"
| project id, name, AspName, isSlot, Sku
| summarize Slots = countif(isSlot == 1) by id, name, AspName, Sku
-| extend recommendationid = "asp-5"
| extend DeploymentSlotEnabled = iff(Slots > 1, true, false)
-| project recommendationid, name, id, Sku, Slots, DeploymentSlotEnabled
+| where DeploymentSlotEnabled = false
+| project recommendationId="app-5", name, id, tags="", param1=Sku, param2=Slots, param3="DeploymentSlotEnabled=false"
diff --git a/docs/content/services/web/web-app/code/app-6/app-6.kql b/docs/content/services/web/web-app/code/app-6/app-6.kql
index f55d7c14a..68801f9e2 100644
--- a/docs/content/services/web/web-app/code/app-6/app-6.kql
+++ b/docs/content/services/web/web-app/code/app-6/app-6.kql
@@ -1,7 +1,8 @@
// Azure Resource Graph Query
-//Provides a list of Azure App Service resources and checks if they is any App Setting configured under this App
+//Provides a list of Azure App Service resources that don't have App Settings configured
appserviceresources
-| where ['type'] == "microsoft.web/sites/config"
-| project recommendationId="app-5", id, name, AppSettings = iif(isempty(properties.AppSettings), 0, 1) , properties
-
+| where type == "microsoft.web/sites/config"
+| extend AppSettings = iif(isempty(properties.AppSettings), true, false)
+| where AppSettings == false
+| project recommendationId="app-6", id, name, tags="", param1="AppSettings is not configured"
diff --git a/docs/content/well-architected/1-define/code/cm-1/cm-1.azcli b/docs/content/well-architected/1-define/code/cm-1/cm-1.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/1-define/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/1-define/code/cm-1/cm-1.kql b/docs/content/well-architected/1-define/code/cm-1/cm-1.kql
index 8fa0b5a6f..fa5cad258 100644
--- a/docs/content/well-architected/1-define/code/cm-1/cm-1.kql
+++ b/docs/content/well-architected/1-define/code/cm-1/cm-1.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe"
-| project recommendationId = "cm-1", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/1-define/code/cm-1/cm-1.ps1 b/docs/content/well-architected/1-define/code/cm-1/cm-1.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/1-define/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/1-define/code/cm-2/cm-2.azcli b/docs/content/well-architected/1-define/code/cm-2/cm-2.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/1-define/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/1-define/code/cm-2/cm-2.kql b/docs/content/well-architected/1-define/code/cm-2/cm-2.kql
index c86d926a9..fa5cad258 100644
--- a/docs/content/well-architected/1-define/code/cm-2/cm-2.kql
+++ b/docs/content/well-architected/1-define/code/cm-2/cm-2.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe2"
-| project recommendationId = "cm-2", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/1-define/code/cm-2/cm-2.ps1 b/docs/content/well-architected/1-define/code/cm-2/cm-2.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/1-define/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/2-design/code/cm-1/cm-1.azcli b/docs/content/well-architected/2-design/code/cm-1/cm-1.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/2-design/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/2-design/code/cm-1/cm-1.kql b/docs/content/well-architected/2-design/code/cm-1/cm-1.kql
index 8fa0b5a6f..fa5cad258 100644
--- a/docs/content/well-architected/2-design/code/cm-1/cm-1.kql
+++ b/docs/content/well-architected/2-design/code/cm-1/cm-1.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe"
-| project recommendationId = "cm-1", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/2-design/code/cm-1/cm-1.ps1 b/docs/content/well-architected/2-design/code/cm-1/cm-1.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/2-design/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/2-design/code/cm-2/cm-2.azcli b/docs/content/well-architected/2-design/code/cm-2/cm-2.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/2-design/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/2-design/code/cm-2/cm-2.kql b/docs/content/well-architected/2-design/code/cm-2/cm-2.kql
index c86d926a9..fa5cad258 100644
--- a/docs/content/well-architected/2-design/code/cm-2/cm-2.kql
+++ b/docs/content/well-architected/2-design/code/cm-2/cm-2.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe2"
-| project recommendationId = "cm-2", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/2-design/code/cm-2/cm-2.ps1 b/docs/content/well-architected/2-design/code/cm-2/cm-2.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/2-design/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/3-test/_index.md b/docs/content/well-architected/3-test/_index.md
index a4d2808ad..a7bcd7407 100644
--- a/docs/content/well-architected/3-test/_index.md
+++ b/docs/content/well-architected/3-test/_index.md
@@ -19,7 +19,7 @@ Before deploying the system, comprehensive tests are conducted to validate the d
| :---------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------: | :------: | :------: | :-----------------: |
| [WATS-1 - Test your applications for availability and resiliency](#wats-1---test-your-applications-for-availability-and-resiliency) | Application Resilience | High | Verified | No |
| [WATS-2 - Consider building logic into your workload to handle errors](#wats-2---consider-building-logic-into-your-workload-to-handle-errors) | Application Resilience | High | Verified | No |
-| [WATS-3 - Perform disaster recovery tests regularly](#wats-3---perform-disaster-recovery-tests-regularly) | Disaster Recovery | Medium | Verified | No |
+| [WATS-3 - Perform disaster recovery tests regularly](#wats-3---perform-disaster-recovery-tests-regularly) | Disaster Recovery | High | Verified | No |
| [WATS-4 - Use chaos engineering to test Azure applications](#wats-4---use-chaos-engineering-to-test-azure-applications) | Application Resilience | Medium | Verified | No |
| [WATS-5 - Test application fault resiliency](#wats-5---test-application-fault-resiliency) | Application Resilience | High | Verified | No |
{{< /table >}}
@@ -90,7 +90,7 @@ Key points:
**Category: Disaster Recovery**
-**Impact: Medium**
+**Impact: High**
**Recommendation/Guidance**
diff --git a/docs/content/well-architected/3-test/code/cm-1/cm-1.azcli b/docs/content/well-architected/3-test/code/cm-1/cm-1.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/3-test/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/3-test/code/cm-1/cm-1.kql b/docs/content/well-architected/3-test/code/cm-1/cm-1.kql
index 8fa0b5a6f..fa5cad258 100644
--- a/docs/content/well-architected/3-test/code/cm-1/cm-1.kql
+++ b/docs/content/well-architected/3-test/code/cm-1/cm-1.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe"
-| project recommendationId = "cm-1", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/3-test/code/cm-1/cm-1.ps1 b/docs/content/well-architected/3-test/code/cm-1/cm-1.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/3-test/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/3-test/code/cm-2/cm-2.azcli b/docs/content/well-architected/3-test/code/cm-2/cm-2.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/3-test/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/3-test/code/cm-2/cm-2.kql b/docs/content/well-architected/3-test/code/cm-2/cm-2.kql
index c86d926a9..fa5cad258 100644
--- a/docs/content/well-architected/3-test/code/cm-2/cm-2.kql
+++ b/docs/content/well-architected/3-test/code/cm-2/cm-2.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe2"
-| project recommendationId = "cm-2", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/3-test/code/cm-2/cm-2.ps1 b/docs/content/well-architected/3-test/code/cm-2/cm-2.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/3-test/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/4-deploy/_index.md b/docs/content/well-architected/4-deploy/_index.md
index ed99001ff..6befa1379 100644
--- a/docs/content/well-architected/4-deploy/_index.md
+++ b/docs/content/well-architected/4-deploy/_index.md
@@ -53,7 +53,7 @@ Key Points:
-### WADP-2 - Validated all changes in development environments before applying them to Production
+### WADP-2 - Validated all changes in development environments before applying them to production
**Category: Automation**
@@ -61,7 +61,7 @@ Key Points:
**Recommendation/Guidance**
-FILL ME IN...
+Continuously delivering value has become a mandatory requirement for organizations. To deliver value to your end users, you must release continually and without errors. Continuous delivery (CD) is the process of automating build, test, configuration, and deployment from a build to a production environment. A release pipeline can create multiple testing or staging environments to automate infrastructure creation and deploy new builds. Successive environments support progressively longer-running integration, load, and user acceptance testing activities.
**Resources**
diff --git a/docs/content/well-architected/4-deploy/code/cm-1/cm-1.azcli b/docs/content/well-architected/4-deploy/code/cm-1/cm-1.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/4-deploy/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/4-deploy/code/cm-1/cm-1.kql b/docs/content/well-architected/4-deploy/code/cm-1/cm-1.kql
index 8fa0b5a6f..fa5cad258 100644
--- a/docs/content/well-architected/4-deploy/code/cm-1/cm-1.kql
+++ b/docs/content/well-architected/4-deploy/code/cm-1/cm-1.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe"
-| project recommendationId = "cm-1", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/4-deploy/code/cm-1/cm-1.ps1 b/docs/content/well-architected/4-deploy/code/cm-1/cm-1.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/4-deploy/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/4-deploy/code/cm-2/cm-2.azcli b/docs/content/well-architected/4-deploy/code/cm-2/cm-2.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/4-deploy/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/4-deploy/code/cm-2/cm-2.kql b/docs/content/well-architected/4-deploy/code/cm-2/cm-2.kql
index c86d926a9..fa5cad258 100644
--- a/docs/content/well-architected/4-deploy/code/cm-2/cm-2.kql
+++ b/docs/content/well-architected/4-deploy/code/cm-2/cm-2.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe2"
-| project recommendationId = "cm-2", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/4-deploy/code/cm-2/cm-2.ps1 b/docs/content/well-architected/4-deploy/code/cm-2/cm-2.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/4-deploy/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/5-monitor/_index.md b/docs/content/well-architected/5-monitor/_index.md
index 499684528..032240c04 100644
--- a/docs/content/well-architected/5-monitor/_index.md
+++ b/docs/content/well-architected/5-monitor/_index.md
@@ -21,6 +21,7 @@ Ongoing monitoring is essential for maintaining system reliability. Key performa
| [WAMN-2 - Define a health model based on performance, availability, and recovery targets](#wamn-2---define-a-health-model-based-on-performance-availability-and-recovery-targets) | Monitoring | Low | Verified | No |
| [WAMN-3 - Create Dashboards and Alerts for Azure Platform resources](#wamn-3---create-dashboards-and-alerts-for-azure-platform-resources) | Monitoring | Low | Verified | No |
| [WAMN-4 - Ensure that the right people in your organization will be notified about any future service issues](#wamn-4---ensure-that-the-right-people-in-your-organization-will-be-notified-about-any-future-service-issues) | Monitoring | Medium | Verified | No |
+| [WAMN-5 - Utilize built-in Resilience policies](#wamn-5---utilize-built-in-resilience-policies) | Governance | Medium | Verified | No |
{{< /table >}}
{{< alert style="info" >}}
@@ -120,3 +121,20 @@ Azure offers a suite of experiences to keep you informed about the health of you
- [Create a Service Health alert using the Azure portal](https://learn.microsoft.com/azure/service-health/alerts-activity-log-service-notifications-portal#create-a-service-health-alert-using-the-azure-portal)
+
+### WAMN-5 - Utilize built-in Resilience policies
+
+**Category: Governance**
+
+**Impact: Medium**
+
+**Recommendation/Guidance**
+
+Utilize Azure's built-in Resilience policies to audit and enforce resilient configurations of Azure services. Azure Policy helps to enforce organizational standards and to assess compliance at-scale.
+
+**Resources**
+
+- [Built-in Resilience policy definitions](https://github.com/Azure/azure-policy/tree/master/built-in-policies/policyDefinitions/Resilience)
+- [Get policy compliance data](https://learn.microsoft.com/azure/governance/policy/how-to/get-compliance-data)
+
+
diff --git a/docs/content/well-architected/5-monitor/code/cm-1/cm-1.azcli b/docs/content/well-architected/5-monitor/code/cm-1/cm-1.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/5-monitor/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/5-monitor/code/cm-1/cm-1.kql b/docs/content/well-architected/5-monitor/code/cm-1/cm-1.kql
index 8fa0b5a6f..fa5cad258 100644
--- a/docs/content/well-architected/5-monitor/code/cm-1/cm-1.kql
+++ b/docs/content/well-architected/5-monitor/code/cm-1/cm-1.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe"
-| project recommendationId = "cm-1", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/5-monitor/code/cm-1/cm-1.ps1 b/docs/content/well-architected/5-monitor/code/cm-1/cm-1.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/5-monitor/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/5-monitor/code/cm-2/cm-2.azcli b/docs/content/well-architected/5-monitor/code/cm-2/cm-2.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/5-monitor/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/5-monitor/code/cm-2/cm-2.kql b/docs/content/well-architected/5-monitor/code/cm-2/cm-2.kql
index c86d926a9..fa5cad258 100644
--- a/docs/content/well-architected/5-monitor/code/cm-2/cm-2.kql
+++ b/docs/content/well-architected/5-monitor/code/cm-2/cm-2.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe2"
-| project recommendationId = "cm-2", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/5-monitor/code/cm-2/cm-2.ps1 b/docs/content/well-architected/5-monitor/code/cm-2/cm-2.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/5-monitor/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/6-respond/code/cm-1/cm-1.azcli b/docs/content/well-architected/6-respond/code/cm-1/cm-1.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/6-respond/code/cm-1/cm-1.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/6-respond/code/cm-1/cm-1.kql b/docs/content/well-architected/6-respond/code/cm-1/cm-1.kql
index 8fa0b5a6f..fa5cad258 100644
--- a/docs/content/well-architected/6-respond/code/cm-1/cm-1.kql
+++ b/docs/content/well-architected/6-respond/code/cm-1/cm-1.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe"
-| project recommendationId = "cm-1", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/6-respond/code/cm-1/cm-1.ps1 b/docs/content/well-architected/6-respond/code/cm-1/cm-1.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/6-respond/code/cm-1/cm-1.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/content/well-architected/6-respond/code/cm-2/cm-2.azcli b/docs/content/well-architected/6-respond/code/cm-2/cm-2.azcli
deleted file mode 100644
index 53d6ce9b0..000000000
--- a/docs/content/well-architected/6-respond/code/cm-2/cm-2.azcli
+++ /dev/null
@@ -1 +0,0 @@
-az resource list --resource-type "Micosoft.Example/changeMe" | jq .
diff --git a/docs/content/well-architected/6-respond/code/cm-2/cm-2.kql b/docs/content/well-architected/6-respond/code/cm-2/cm-2.kql
index c86d926a9..fa5cad258 100644
--- a/docs/content/well-architected/6-respond/code/cm-2/cm-2.kql
+++ b/docs/content/well-architected/6-respond/code/cm-2/cm-2.kql
@@ -1,6 +1 @@
-// Azure Resource Graph Query
-// Brief description of the intent of the query (focus on returning resources NOT following your recommendation, and usually name and ResourceId are enough for the report)
-Resources
-| where type =~ "Microsoft.Example/changeMe2"
-| project recommendationId = "cm-2", name, id
-| order by id asc
+// cannot-be-validated-with-arg
diff --git a/docs/content/well-architected/6-respond/code/cm-2/cm-2.ps1 b/docs/content/well-architected/6-respond/code/cm-2/cm-2.ps1
deleted file mode 100644
index d9007ae40..000000000
--- a/docs/content/well-architected/6-respond/code/cm-2/cm-2.ps1
+++ /dev/null
@@ -1 +0,0 @@
-Get-AzResource -ResourceType "Micrsoft.Example/changeMe" | Select-Object name, location, resourceGroup, properties
diff --git a/docs/static/media/img/aprl-transparent-white-text.png b/docs/static/media/img/aprl-transparent-white-text.png
index 4f97117c7..47a394279 100644
Binary files a/docs/static/media/img/aprl-transparent-white-text.png and b/docs/static/media/img/aprl-transparent-white-text.png differ
diff --git a/docs/static/media/img/aprl-transparent.png b/docs/static/media/img/aprl-transparent.png
index 53d3ec21f..47a394279 100644
Binary files a/docs/static/media/img/aprl-transparent.png and b/docs/static/media/img/aprl-transparent.png differ
diff --git a/docs/static/media/img/aprl-white.png b/docs/static/media/img/aprl-white.png
index 01cb91200..2143b6bf1 100644
Binary files a/docs/static/media/img/aprl-white.png and b/docs/static/media/img/aprl-white.png differ
diff --git a/docs/themes/ace-documentation/layouts/partials/head.html b/docs/themes/ace-documentation/layouts/partials/head.html
index 86e34a710..b2e1209af 100644
--- a/docs/themes/ace-documentation/layouts/partials/head.html
+++ b/docs/themes/ace-documentation/layouts/partials/head.html
@@ -31,4 +31,14 @@
{{ template "_internal/google_analytics_async.html" . }}
{{ end }}
+
+
+
+
diff --git a/services-abbreviations.csv b/services-abbreviations.csv
index 161d383a6..dd890ffa1 100644
--- a/services-abbreviations.csv
+++ b/services-abbreviations.csv
@@ -1,21 +1,23 @@
-type,name,abbreviation
+type,name,abbreviation
microsoft.databricks/workspaces,Azure Databricks,dbw
microsoft.eventgrid/domains,Event Grid,evg
microsoft.web/serverfarms,App Service Plan,asp
microsoft.virtualmachineimages/imagetemplates,Image template,it
microsoft.compute/virtualmachines,Virtual machine,vm
microsoft.compute/virtualmachinescalesets,Virtual machine scale set,vmss
+microsoft.containerservice/managedclusters,Kubernetes Service,aks
microsoft.containerregistry/registries,Container registry,cr
-microsoft.documentdb/databaseaccounts/sqldatabases,Azure Cosmos DB database,cosmos
+microsoft.documentdb/databaseaccounts,Azure Cosmos DB database,cosmos
microsoft.sql/servers/databases,Azure SQL database,sqldb
microsoft.operationalinsights/workspaces,Log Analytics workspace,log
microsoft.network/applicationgateways,Application gateway,agw
-microsoft.network/connections,Ddos Protection Plan,ddos
microsoft.network/privatednszones,Private DNS Zone,pvdnsz
microsoft.network/azurefirewalls,Azure Firewall,afw
-microsoft.network/expressroutecircuits,ExpressRoute circuit,erc
+microsoft.network/expressroutecircuits,ExpressRoute Circuit,erc
+microsoft.network/virtualNetworkGateways,ExpressRoute Gateway,ergw
microsoft.network/expressroutegateway,ExpressRoute Gateway,ergw
microsoft.cdn/profiles,Azure Front Door,afd
+microsoft.cdn/frontdoors,Azure Front Door,afd
microsoft.network/loadbalancers,Load balancer,lb
microsoft.network/networksecuritygroups,Network security group,nsg
microsoft.network/networkwatchers,Network Watcher,nw
@@ -27,5 +29,24 @@ microsoft.network/virtualnetworks,Virtual network,vnet
microsoft.keyvault/vaults,Key vault,kv
microsoft.network/vpngateways,VPN Gateway,vpng
microsoft.network/firewallpolicies,Web Application Firewall,waf
-microsoft.netapp/netappaccounts,Azure Netapp Files,anf
+microsoft.netapp/netappaccounts,Azure NetApp Files,anf
microsoft.storage/storageaccounts,Storage account,st
+microsoft.compute/galleries,Compute Gallery,cg
+microsoft.dbforpostgresql/flexibleservers,DB for PostgreSQL,psql
+microsoft.cache/redis,Redis Cache,redis
+microsoft.apimanagement/service,Api Management,apim
+microsoft.eventhub/namespaces,Event Hub,evhns
+microsoft.devices/iothubs,IoT Hub,ioth
+microsoft.automation/automationaccounts,Automation Account,aa
+microsoft.recoveryservices/vaults,Azure Backup,bk
+microsoft.insights/components,Application Insights,appi
+microsoft.insights/activitylogalerts,Resource Health Alerts,msr
+microsoft.network/connections,ExpressRoute Connection,ercon
+microsoft.network/expressrouteports,ExpressRoute Direct,erd
+microsoft.desktopvirtualization/hostpools,Azure Virtual Desktop,avd
+microsoft.avs/privateclouds,Azure VMware Solution,avs
+microsoft.signalrservice/signalr,SignalR,sigr
+microsoft.web/sites,Web App,app
+microsoft.batch/batchaccounts,Batch Accounts,ba
+microsoft.resources/resourcegroups,Resource Groups,rg
+microsoft.dbformysql/flexibleservers,DB for MySQL,mysql