Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-mount parallelstore support #3256

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

harshthakkar01
Copy link
Contributor

@harshthakkar01 harshthakkar01 commented Nov 13, 2024

This PR,

  • excludes GPU interfaces in daos config file
  • Add support for multiple mount for single parallelstore instance (creates systemd service for each mount)

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@harshthakkar01 harshthakkar01 added the release-improvements Added to release notes under the "Improvements" heading. label Nov 13, 2024
@harshthakkar01 harshthakkar01 changed the title Update mount parallelstore script to support multiple parallelstore Add multi-mount parallelstore support Nov 15, 2024
@@ -41,6 +41,17 @@ sed -i "s/#.*transport_config/transport_config/g" $daos_config
sed -i "s/#.*allow_insecure:.*false/ allow_insecure: true/g" $daos_config
sed -i "s/.*access_points.*/access_points: $access_points/g" $daos_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the goal here is only to support multiple mounts from the server Parallelstore instance, or to support mounts from multiple Parallelstore instances?

I fear, that with multiple Parallelstore instances, the last script that will run will setup access_points in the config, and it will be the only instance mounted (but multiple times)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to have 1 parallelstore instance but able to mount to multiple mount points (possible with different mount options).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, perfect. I think it is worth mentioning in the README, that module supports mounting only from Parallelstore instance, but it's possibile to mount it multiple times with different options using pre-existing-network-storage.

Probably worth mentioning in both modules.

Copy link
Member

@tpdownes tpdownes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to know a bit more about what attempting to support multiple Parallelstore instances. In the meantime, the changes I suggest will improve reliability.

@@ -41,6 +41,17 @@ sed -i "s/#.*transport_config/transport_config/g" $daos_config
sed -i "s/#.*allow_insecure:.*false/ allow_insecure: true/g" $daos_config
sed -i "s/.*access_points.*/access_points: $access_points/g" $daos_config

# Get interface names with "s0f0" suffix
if ifconfig -a | grep 's0f0'; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that you are attempting to identify devices that are on the same PCI bus as GPUs in the A3 VM types (a3-highgpu-8g and a3-megagpu-8g). This is not a reliable mechanism by which to do that.

  1. I believe ifconfig is not uniformly installed across OS clients. (e.g. Ubuntu 22.04)
  2. The "tail" of the device name is the least predictable element of the automated device name.

The following command strictly identifies ethernet devices not attached to PCI slot 0, which is guaranteed to be nic0 as seen by the Cloud Console. It is also the interface guaranteed to route to the metadata server

find /sys/class/net/ -not -name 'enp0s*' -regextype posix-extended -regex '.*/enp[0-9]+s.*' -printf "%f\n" | paste -s -d ','

I believe you could this command and then test for whether it has zero length.

mount_command="if mountpoint -q '$local_mount'; then fusermount3 -u '$local_mount'; fi; for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user $mount_options --foreground && break; sleep 1; done"

# Construct the service name with the local_mount suffix
service_name="mount_parallelstore_${local_mount//\//_}.service"
Copy link
Member

@tpdownes tpdownes Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use systemd-escape -p "${local_mount}" to generate that part of the unit name.

Comment on lines 91 to 93
[Unit]
Description=DAOS Mount Service
After=network-online.target daos_agent.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Unit]
Description=DAOS Mount Service
After=network-online.target daos_agent.service
[Unit]
Description=DAOS Mount Service
After=network-online.target daos_agent.service
Before=slurmd.service
ConditionPathIsMountPoint=!${local_mount}

# Store the mounting logic in a variable
mount_command='for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user '$mount_options' --foreground && break; echo \"dfuse, failed, retrying in 1 second (attempt '$i'/10)\"; sleep 1; done'
mount_command="if mountpoint -q '$local_mount'; then fusermount3 -u '$local_mount'; fi; for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user $mount_options --foreground && break; sleep 1; done"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line entirely. The command should not include sleep logic.

Comment on lines +99 to +102
Restart=always
RestartSec=1
ExecStart=/bin/bash -c "$mount_command"
ExecStop=fusermount3 -u '$local_mount'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Restart=always
RestartSec=1
ExecStart=/bin/bash -c "$mount_command"
ExecStop=fusermount3 -u '$local_mount'
Restart=on-failure
RestartSec=10
ExecStart=/bin/dfuse -m $local_mount --pool default-pool --container default-container --multi-user $mount_options --foreground
ExecStop=/usr/bin/fusermount3 -u $local_mount

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you make these changes and we discuss how multi-Parallelstore instance suport would look like, I would like to consider using systemd unit templates to avoid re-creation of multiple service files.

https://fedoramagazine.org/systemd-template-unit-files/

@tpdownes tpdownes assigned harshthakkar01 and unassigned tpdownes Nov 18, 2024
@mr0re1 mr0re1 assigned mr0re1 and unassigned harshthakkar01 Nov 19, 2024
@tpdownes tpdownes assigned tpdownes and unassigned mr0re1 Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-improvements Added to release notes under the "Improvements" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants