Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions for manually configuring an Azure Batch pool #325

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

adamrtalbot
Copy link
Contributor

@adamrtalbot adamrtalbot commented Dec 2, 2024

Documentation for adding an Azure Batch pool manually so it is compatible with Seqera and Nextflow.

Copy link

netlify bot commented Dec 2, 2024

Deploy Preview for seqera-docs failed. Why did it fail? →

Name Link
🔨 Latest commit 139a8c8
🔍 Latest deploy log https://app.netlify.com/sites/seqera-docs/deploys/674da295f25cbc0008b357a6

Copy link

netlify bot commented Dec 2, 2024

Deploy Preview for seqera-docs ready!

Name Link
🔨 Latest commit e2623e4
🔍 Latest deploy log https://app.netlify.com/sites/seqera-docs/deploys/6752ec9bf66b96000847a99d
😎 Deploy Preview https://deploy-preview-325--seqera-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jason-seqera
Copy link
Contributor

Copy link

@pcolaianni pcolaianni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrtalbot These changes are great. There is nothing in my notes that is missing here, except for the IPs used by Seqera platform. I had to add those IPs to the Storage Account's firewall.

While reading, I found an explanation for all settings and I really appreciate that.

Important: during our call, when setting up the compute environment in Seqera, you were able to tell me "if that field is not autocompleted, it means that something is off in X or Y". Do you think there is space in this documentation for this knowledge? Or maybe in the setup page itself?

For example, if the container names do not pop up, it means that Seqera has no (network) access to the Storage Account. Or that the correct roles are not set for the provided service principal and managed identity. It'd be great if the page informed the user.

Same goes for the pools. It is expected that the names pop up. Correct?

#### Entra service principal
#### Entra service principal and managed identity

If using Entra for authentication, you must also create a service principal and managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If using Entra for authentication, you must also create a service principal and managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.
If using Entra for authentication, you must create a service principal and a managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.

If "storage" refers to "azure storage account", it might be worth replacing all occurrences with "Azure Storage Account".

Copy link

@gwright99 gwright99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts and observations on a quick read through. Will make time to actually try the steps later.

#### Entra service principal
#### Entra service principal and managed identity

If using Entra for authentication, you must create a service principal and managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and Azure Storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Should there be links out to MSFT-managed docs explaining SPs? Oh I see, it's described in a section above. Point still holds true (just for the earlier description). Nevermind, I see you have it below.

⛏️ I'd probably frontload the explanations, but this works too.

#### Entra service principal
#### Entra service principal and managed identity

If using Entra for authentication, you must create a service principal and managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and Azure Storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ SP type qualification necessary? (i.e. user-managed). Nevermind, I see this is handled below.

#### Entra service principal
#### Entra service principal and managed identity

If using Entra for authentication, you must create a service principal and managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and Azure Storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Do our docs adequately cover how to attach an SP to an Azure Batch pool (I haven't checked)? If yes, does this cover both Manual and Forge flows (assuming both are applicable)?


If using Entra for authentication, you must create a service principal and managed identity. Seqera Platform uses the Service Principal to authenticate to Azure Batch and Azure Storage. It submits a Nextflow task as the head process to run Nextflow, which authenticates to Azure Batch and storage using the Managed Identity attached to the node pool.

Therefore, you must create both an Entra service principal and a managed identity. You add the service principal to your Seqera Platform credentials and attach the managed identity to your Azure Batch node pool which will run Nextflow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, two different MSFT identities need to be used:

  1. Service Principle type -- which can accommodate the fact that Tower lives outside the Azure network and likely is calling across the public internet.

  2. Managed Identity -- can be used since the Pool lives inside the account and thus has a greater assurance level.

Correct?

Yes, explained below:

When you use a manually configured compute environment with a managed identity attached to the Azure Batch Pool, Nextflow can use this managed identity for authentication. However, Platform still needs to use access keys or an Entra service principal to submit the initial task to Azure Batch to run Nextflow, which will then proceed with the managed identity for subsequent authentication.


In general, we recommend using the E family of machines for bioinformatics workloads since these are cost effective, widely available and sufficiently fast.

1. **vCPUs**: The number of vCPUs the machine has. This is the main factor in determining the speed of the machine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Personally I like this level of detail, but there is a risk of it getting out of date. Docs probably will need to set themselves a recurring task to go check accuracy every so often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the user must decide on the specific machine type, stronger guidance is required than something like AWS.

I've tried to keep it generic (families, features) that shouldn't change much but we will need to add new categories as they are included.


This section is for users with a pre-configured Azure Batch pool. This requires an existing Azure Batch account with an existing pool.
It is possible to set up Seqera Platform to use a pre-existing Azure Batch pool. This allows the use of more advanced Azure Batch features, such as custom VM images and private networking.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link out to MSFT pages that explain the benefits of why you might want these features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -250,15 +304,83 @@ Create a Batch Forge Azure Batch compute environment:
See [Launch pipelines](../launch/launchpad.mdx) to start executing workflows in your Azure Batch compute environment.
:::

## Manual
### Manual

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ I don't see Forge here. Is that an omission by accident, or can the SP / Managed Identity config only occur via Manual creation for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only manual, the API for adding a managed identity to a compute pool doesn't work.

```
// Compute the target nodes based on pending tasks.
// $PendingTasks == The sum of $ActiveTasks and $RunningTasks
$samples = $PendingTasks.GetSamplePercent(interval);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoScalingFormulaEvaluationError: The specified auto-scaling formula has evaluation error
Message: Line 3, Col 43: Undefined symbol: interval
Result: $TargetDedicatedNodes=0;$TargetLowPriorityNodes=0;$NodeDeallocationOption=requeue
PropertyName: formula
PropertyPath: properties.scaleSettings.autoScale.formula

Something missing?

Comment on lines +336 to +345
// Compute the target nodes based on pending tasks.
// $PendingTasks == The sum of $ActiveTasks and $RunningTasks
$samples = $PendingTasks.GetSamplePercent(interval);
$tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max( $PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
$targetVMs = $tasks > 0 ? $tasks : max(0, $TargetDedicatedNodes/2);
targetPoolSize = max(0, min($targetVMs, 8));

// For first interval deploy 1 node, for other intervals scale up/down as per tasks.
$TargetDedicatedNodes = targetPoolSize;
$NodeDeallocationOption = taskcompletion;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I tried to simplify it. Try this:

Suggested change
// Compute the target nodes based on pending tasks.
// $PendingTasks == The sum of $ActiveTasks and $RunningTasks
$samples = $PendingTasks.GetSamplePercent(interval);
$tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max( $PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
$targetVMs = $tasks > 0 ? $tasks : max(0, $TargetDedicatedNodes/2);
targetPoolSize = max(0, min($targetVMs, 8));
// For first interval deploy 1 node, for other intervals scale up/down as per tasks.
$TargetDedicatedNodes = targetPoolSize;
$NodeDeallocationOption = taskcompletion;
// Get pool lifetime since creation.
lifespan = time() - time("2024-10-30T00:00:00.880011Z");
interval = TimeInterval_Minute * 5;
// Compute the target nodes based on pending tasks.
// $PendingTasks == The sum of $ActiveTasks and $RunningTasks
$samples = $PendingTasks.GetSamplePercent(interval);
$tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max( $PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
$targetVMs = $tasks > 0 ? $tasks : max(0, $TargetDedicatedNodes/2);
targetPoolSize = max(0, min($targetVMs, 8));
// For first interval deploy 1 node, for other intervals scale up/down as per tasks.
$TargetLowPriorityNodes = lifespan < interval ? 1 : targetPoolSize;
$NodeDeallocationOption = taskcompletion;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this I was able to create the pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants