-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-mount parallelstore support #3256
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -41,6 +41,17 @@ sed -i "s/#.*transport_config/transport_config/g" $daos_config | |||||||||||||||||
sed -i "s/#.*allow_insecure:.*false/ allow_insecure: true/g" $daos_config | ||||||||||||||||||
sed -i "s/.*access_points.*/access_points: $access_points/g" $daos_config | ||||||||||||||||||
|
||||||||||||||||||
# Get interface names with "s0f0" suffix | ||||||||||||||||||
if ifconfig -a | grep 's0f0'; then | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that you are attempting to identify devices that are on the same PCI bus as GPUs in the A3 VM types (a3-highgpu-8g and a3-megagpu-8g). This is not a reliable mechanism by which to do that.
The following command strictly identifies ethernet devices not attached to PCI slot 0, which is guaranteed to be nic0 as seen by the Cloud Console. It is also the interface guaranteed to route to the metadata server
I believe you could this command and then test for whether it has zero length. |
||||||||||||||||||
sof0_interfaces=$(ifconfig -a | grep 's0f0:' | awk '{print $1}' | tr ':' '\n' | grep -v '^$' | awk '!a[$0]++' | sed 's/^/"/g' | sed 's/$/"/g' | paste -sd, -) | ||||||||||||||||||
|
||||||||||||||||||
# Append the sof0_interfaces to the existing list | ||||||||||||||||||
exclude_fabric_ifaces="lo,$sof0_interfaces" | ||||||||||||||||||
|
||||||||||||||||||
# Update the file with the new list | ||||||||||||||||||
sed -i "s/#.*exclude_fabric_ifaces: \[.*/exclude_fabric_ifaces: [$exclude_fabric_ifaces]/" $daos_config | ||||||||||||||||||
fi | ||||||||||||||||||
|
||||||||||||||||||
# Start service | ||||||||||||||||||
if { [ "${OS_ID}" = "rocky" ] || [ "${OS_ID}" = "rhel" ]; } && { [ "${OS_VERSION_MAJOR}" = "8" ] || [ "${OS_VERSION_MAJOR}" = "9" ]; }; then | ||||||||||||||||||
# TODO: Update script to change default log destination folder, after daos_agent user is supported in debian and ubuntu. | ||||||||||||||||||
|
@@ -69,39 +80,33 @@ sed -i "s/#.*user_allow_other/user_allow_other/g" $fuse_config | |||||||||||||||||
# make sure limit of open files is high enough for dfuse (1M of open files) | ||||||||||||||||||
ulimit -n 1048576 | ||||||||||||||||||
|
||||||||||||||||||
for i in {1..10}; do | ||||||||||||||||||
# To parse mount_options as --disable-wb-cache --eq-count=8. | ||||||||||||||||||
# shellcheck disable=SC2086 | ||||||||||||||||||
dfuse -m "$local_mount" --pool default-pool --container default-container --multi-user $mount_options && break | ||||||||||||||||||
|
||||||||||||||||||
echo "dfuse failed, retrying in 1 seconds (attempt $i/10)..." | ||||||||||||||||||
sleep 1 | ||||||||||||||||||
done | ||||||||||||||||||
|
||||||||||||||||||
if ! mountpoint -q "$local_mount"; then | ||||||||||||||||||
exit 1 | ||||||||||||||||||
fi | ||||||||||||||||||
|
||||||||||||||||||
# Store the mounting logic in a variable | ||||||||||||||||||
mount_command='for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user '$mount_options' --foreground && break; echo \"dfuse, failed, retrying in 1 second (attempt '$i'/10)\"; sleep 1; done' | ||||||||||||||||||
mount_command="if mountpoint -q '$local_mount'; then fusermount3 -u '$local_mount'; fi; for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user $mount_options --foreground && break; sleep 1; done" | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove this line entirely. The command should not include sleep logic. |
||||||||||||||||||
|
||||||||||||||||||
# Construct the service name with the local_mount suffix | ||||||||||||||||||
service_name="mount_parallelstore_${local_mount//\//_}.service" | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use |
||||||||||||||||||
|
||||||||||||||||||
# --- Begin: Add systemd service creation --- | ||||||||||||||||||
cat >/usr/lib/systemd/system/mount_parallelstore.service <<EOF | ||||||||||||||||||
cat >/usr/lib/systemd/system/"${service_name}" <<EOF | ||||||||||||||||||
[Unit] | ||||||||||||||||||
Description=DAOS Mount Service | ||||||||||||||||||
After=network-online.target daos_agent.service | ||||||||||||||||||
Comment on lines
91
to
93
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
[Service] | ||||||||||||||||||
Type=oneshot | ||||||||||||||||||
Type=simple | ||||||||||||||||||
User=root | ||||||||||||||||||
Group=root | ||||||||||||||||||
ExecStart=/bin/bash -c '$mount_command' | ||||||||||||||||||
Restart=always | ||||||||||||||||||
RestartSec=1 | ||||||||||||||||||
ExecStart=/bin/bash -c "$mount_command" | ||||||||||||||||||
ExecStop=fusermount3 -u '$local_mount' | ||||||||||||||||||
Comment on lines
+99
to
+102
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After you make these changes and we discuss how multi-Parallelstore instance suport would look like, I would like to consider using systemd unit templates to avoid re-creation of multiple service files. |
||||||||||||||||||
|
||||||||||||||||||
[Install] | ||||||||||||||||||
WantedBy=multi-user.target | ||||||||||||||||||
EOF | ||||||||||||||||||
|
||||||||||||||||||
systemctl enable mount_parallelstore.service | ||||||||||||||||||
systemctl enable "${service_name}" | ||||||||||||||||||
systemctl start "${service_name}" | ||||||||||||||||||
# --- End: Add systemd service creation --- | ||||||||||||||||||
|
||||||||||||||||||
exit 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,17 @@ sed -i "s/#.*transport_config/transport_config/g" $daos_config | |
sed -i "s/#.*allow_insecure:.*false/ allow_insecure: true/g" $daos_config | ||
sed -i "s/.*access_points.*/access_points: $access_points/g" $daos_config | ||
|
||
# Get interface names with "s0f0" suffix | ||
if ifconfig -a | grep 's0f0'; then | ||
sof0_interfaces=$(ifconfig -a | grep 's0f0:' | awk '{print $1}' | tr ':' '\n' | grep -v '^$' | awk '!a[$0]++' | sed 's/^/"/g' | sed 's/$/"/g' | paste -sd, -) | ||
|
||
# Append the sof0_interfaces to the existing list | ||
exclude_fabric_ifaces="lo,$sof0_interfaces" | ||
|
||
# Update the file with the new list | ||
sed -i "s/#.*exclude_fabric_ifaces: \[.*/exclude_fabric_ifaces: [$exclude_fabric_ifaces]/" $daos_config | ||
fi | ||
|
||
# Start service | ||
if { [ "${OS_ID}" = "rocky" ] || [ "${OS_ID}" = "rhel" ]; } && { [ "${OS_VERSION_MAJOR}" = "8" ] || [ "${OS_VERSION_MAJOR}" = "9" ]; }; then | ||
# TODO: Update script to change default log destination folder, after daos_agent user is supported in debian and ubuntu. | ||
|
@@ -69,39 +80,33 @@ sed -i "s/#.*user_allow_other/user_allow_other/g" $fuse_config | |
# make sure limit of open files is high enough for dfuse (1M of open files) | ||
ulimit -n 1048576 | ||
|
||
for i in {1..10}; do | ||
# To parse mount_options as --disable-wb-cache --eq-count=8. | ||
# shellcheck disable=SC2086 | ||
dfuse -m "$local_mount" --pool default-pool --container default-container --multi-user $mount_options && break | ||
|
||
echo "dfuse failed, retrying in 1 seconds (attempt $i/10)..." | ||
sleep 1 | ||
done | ||
|
||
if ! mountpoint -q "$local_mount"; then | ||
exit 1 | ||
fi | ||
|
||
# Store the mounting logic in a variable | ||
mount_command='for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user '$mount_options' --foreground && break; echo \"dfuse, failed, retrying in 1 second (attempt '$i'/10)\"; sleep 1; done' | ||
mount_command="if mountpoint -q '$local_mount'; then fusermount3 -u '$local_mount'; fi; for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user $mount_options --foreground && break; sleep 1; done" | ||
|
||
# Construct the service name with the local_mount suffix | ||
service_name="mount_parallelstore_${local_mount//\//_}.service" | ||
|
||
# --- Begin: Add systemd service creation --- | ||
cat >/usr/lib/systemd/system/mount_parallelstore.service <<EOF | ||
cat >/usr/lib/systemd/system/"${service_name}" <<EOF | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this land in |
||
[Unit] | ||
Description=DAOS Mount Service | ||
After=network-online.target daos_agent.service | ||
|
||
[Service] | ||
Type=oneshot | ||
Type=simple | ||
User=root | ||
Group=root | ||
ExecStart=/bin/bash -c '$mount_command' | ||
Restart=always | ||
RestartSec=1 | ||
ExecStart=/bin/bash -c "$mount_command" | ||
ExecStop=fusermount3 -u '$local_mount' | ||
|
||
[Install] | ||
WantedBy=multi-user.target | ||
EOF | ||
|
||
systemctl enable mount_parallelstore.service | ||
systemctl enable "${service_name}" | ||
systemctl start "${service_name}" | ||
# --- End: Add systemd service creation --- | ||
|
||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the goal here is only to support multiple mounts from the server Parallelstore instance, or to support mounts from multiple Parallelstore instances?
I fear, that with multiple Parallelstore instances, the last script that will run will setup
access_points
in the config, and it will be the only instance mounted (but multiple times)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to have 1 parallelstore instance but able to mount to multiple mount points (possible with different mount options).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, perfect. I think it is worth mentioning in the README, that module supports mounting only from Parallelstore instance, but it's possibile to mount it multiple times with different options using
pre-existing-network-storage
.Probably worth mentioning in both modules.