Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Reimplement btrfs polling with benefits #20893

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

mvollmer
Copy link
Member

@mvollmer mvollmer commented Aug 16, 2024

This reduces the roundtrips and allows us to maintain "temporary" mount points for btrfs filesystems that are not otherwise mounted.

The process will be shutdown when the cockpit session is closed by cockpit-ws, so we get reliable cleanup.

Fixes #20855

Demo: https://youtu.be/aNsdXtHnUUM
Anaconda demo: https://youtu.be/EM-qqogOt6g

  • make btrfs-tool more robust so that the monitor never crashes because of problems with individual filesystems. E.g., manually unmounting /var/lib/cockpit/btrfs/$uuid should just stop reports for that filesystem, and not crash the monitor.
  • demo
  • move subvol creation and deletion to the tool?
  • unmount /var mount during tear down, so that we can erase etc.
  • only in Anaconda mode: don't mount secretly for listing, but if listing works because of non-secret mounts, allow creation everywhere via a temporary secret mount. Otherwise excuse is: "At least one subvolume needs to be mounted."
  • figure out editing the label. if we can find a rw mount, use that plus udevadm trigger, if not, but it is mounted ro, use that as the excuse, if not, use udisks method.
  • don't allow deleting the last mounted subvolume, that would kill the polling. damn.
  • there is a race where cockpit wants to tear down the tmp mount, but it is gone by the time it does it. happens in a Anaconda test, check that.
  • tests
  • coverage

@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from a178c2f to 11232c7 Compare August 16, 2024 08:16
pkg/storaged/btrfs/btrfs-tool.py Fixed Show fixed Hide fixed
pkg/storaged/btrfs/btrfs-tool.py Fixed Show fixed Hide fixed
pkg/storaged/btrfs/btrfs-tool.py Fixed Show fixed Hide fixed
pkg/storaged/btrfs/btrfs-tool.py Fixed Show fixed Hide fixed
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch 2 times, most recently from 03b3bbf to f4fd732 Compare August 16, 2024 11:44
pkg/storaged/btrfs/btrfs-tool.py Fixed Show fixed Hide fixed
pkg/storaged/btrfs/btrfs-tool.py Fixed Show fixed Hide fixed
@mvollmer
Copy link
Member Author

mvollmer commented Aug 19, 2024

Some notes:

  • This will mount the default subvolume of a btrfs filesystem on /var/lib/cockpit/btrfs/$uuid if that is necessary to list its subvolumes.

  • The expectation is that this only happens as little as necessary. There is a lot of machinery in btrfs-tool.py to make this happen. Btrfs subvolumes need to be polled periodically (Cockpit does it every 5 seconds), and mounting a btrfs filesystem every five seconds is not acceptable. So we only mount it once when it becomes necessary and leave it mounted until the Cockpit session ends or the filesystem has been mounted by the user for real.

  • The tool can run concurrently without getting confused. This is necessary when more than one Cockpit session is active on the same machine.

  • We also use this to occasionally run a synchronous poll, for example right after creating a new subvolume in Cockpit. This synchronous poll runs concurrently with the monitor mode of the tool. We can't just trigger the monitor since we want to synchronously get the poll results.

  • The problem with all this is of course that this is a giant race condition for users working on the command line (and also for Cockpit itself): If you unmount a filesystem while someone is logged into Cockpit on the same machine, your filesystem will silently get mounted again and you can't do whatever you wanted to do with the unmounted filesystem. This is a strong argument to only offer the automounting in Anaconda mode.

  • But to counter the last point a bit, if the user discovers the mount in /var/lib/cockpit/btrfs/ and unmounts it manually, then Cockpit will not mount it back. It will just fall out of date with external changes. The synchronous polling will still work, so Cockpit will not fall out of sync with its own actions.

  • We need to mount the filesystem read-write, unfortunately. If we mount it read-only, nobody can mount any subvolume of it read-write. (The first mount of a btrfs filesystem determines the readonlyness of all of it. This also affects other mounts, of course, but they are not our problem here. E.g. if you mount subvol "root" on "/" read-only, you can't mount subvol "var" on "/var" read-write anymore.)

  • But because we mount read-write, we can use the mount point in /var/lib/cockpit/ also to create and delete subvolumes on otherwise unmounted btrfs filesystems.

  • BUT: since subvolumes can only be created opr deleted while the filesystem is mounted somewhere, we could just poll it once Cockpit starts, and then only while it is mounted, and stop polling when it is not mounted, with the assumption that nothing will change when it isn't actually mounted. But this is likely unreliable.

  • Another option is to put a explicit "refresh" button into the Cockpit UI. This would need to be clicked for Cockpit to pick up external changes.

@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from f4fd732 to 98e9cae Compare August 19, 2024 10:11
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from 98e9cae to 804c443 Compare August 22, 2024 06:09
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch 3 times, most recently from 2062f54 to 37ba9ce Compare August 28, 2024 09:32
@mvollmer mvollmer added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Aug 28, 2024
@mvollmer mvollmer removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Aug 28, 2024
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from 91f7300 to 1a21f04 Compare August 28, 2024 13:36
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch 2 times, most recently from 0d8374b to 58c912f Compare August 29, 2024 06:56
@mvollmer mvollmer changed the title WIP - poll btrfs with python storage: Reimplement btrfs polling with benefits Aug 29, 2024
if b['fstype'] == "btrfs":
uuid = b['uuid']
mps = list(filter(lambda x: x is not None and not x.startswith(TMP_MP_DIR), b['mountpoints']))
has_tmp_mp = len(list(filter(lambda x: x is not None and x.startswith(TMP_MP_DIR), b['mountpoints']))) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, could be len(mps)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, `len(mps) < len(b['mountpoints']), yeah this code is sh*t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still want to rewrite this? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

pkg/storaged/btrfs/btrfs-tool.py Show resolved Hide resolved
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from 58c912f to a20f003 Compare August 29, 2024 11:11
@mvollmer mvollmer marked this pull request as ready for review August 29, 2024 11:11
@mvollmer mvollmer requested a review from jelly August 29, 2024 11:11
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from a20f003 to aa8de77 Compare August 30, 2024 06:11
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from aa8de77 to d6d4ab5 Compare August 30, 2024 10:06
if b['fstype'] == "btrfs":
uuid = b['uuid']
mps = list(filter(lambda x: x is not None and not x.startswith(TMP_MP_DIR), b['mountpoints']))
has_tmp_mp = len(list(filter(lambda x: x is not None and x.startswith(TMP_MP_DIR), b['mountpoints']))) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still want to rewrite this? :)

pkg/storaged/btrfs/btrfs-tool.py Show resolved Hide resolved
pkg/storaged/btrfs/btrfs-tool.py Show resolved Hide resolved
pkg/storaged/btrfs/btrfs-tool.py Show resolved Hide resolved
os.makedirs(TMP_MP_DIR, mode=0o700, exist_ok=True)
fd = os.open(path, os.O_RDWR | os.O_CREAT)
fcntl.flock(fd, fcntl.LOCK_EX)
data = os.read(fd, 100000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the most robust :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, forgot about that... thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... this is pretty ok, no? Files will never have a short read, and the 100000 gives us a safety limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a read_all function anyway.

pkg/storaged/btrfs/btrfs-tool.py Show resolved Hide resolved
pkg/storaged/btrfs/volume.jsx Show resolved Hide resolved
pkg/storaged/dialog.jsx Show resolved Hide resolved
pkg/storaged/btrfs/btrfs-tool.py Outdated Show resolved Hide resolved
pkg/storaged/btrfs/btrfs-tool.py Show resolved Hide resolved
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from d6d4ab5 to 24b9224 Compare September 9, 2024 09:13
@mvollmer mvollmer requested a review from jelly September 9, 2024 09:13
We used to ignore all other mounts when the first one is for the root
filesystem. But filesystems can have more than one mount and we should
look at each one individually.

This also prepares the code for more cases.
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch 4 times, most recently from 4e8fc61 to b1b434b Compare September 11, 2024 11:57
The subvolume polling is now done by a long-running monitor program
that only reports changes back to Cockpit. This reduces traffic in
general and allows us to do additional clever things.

The monitor program can optionally keep all btrfs filesystems mounted
in a "secret" location, so that we can list the subvolumes of all
btrfs filesystems, even those that are not officially mounted
otherwise.

This mode is only enabled in Anaconda mode: It is important there
since filesystems are normally not mounted when preparing storage for
Anaconda, and outside of Anaconda mode, keeping long standing secret
mounts is deemed to invasive.

The program can also carry out btrfs operations (such as creating
subvolumes) while temporarily mounting it. This allows Cockpit to
always create and delete subvolumes. The UI still only does it when
monitoring subvolumes works as well, however.
@mvollmer mvollmer force-pushed the storage-btrfs-local-polling branch from b1b434b to 4fba463 Compare September 16, 2024 10:22
@@ -397,15 +290,91 @@ function btrfs_findmnt_poll() {
});
}

function btrfs_update(data) {
if (!client.uuids_btrfs_subvols)
client.uuids_btrfs_subvols = { };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test.

if (!client.uuids_btrfs_subvols)
client.uuids_btrfs_subvols = { };
if (!client.uuids_btrfs_usage)
client.uuids_btrfs_usage = { };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test.

if (!client.uuids_btrfs_usage)
client.uuids_btrfs_usage = { };
if (!client.uuids_btrfs_default_subvol)
client.uuids_btrfs_default_subvol = { };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test.

Comment on lines +306 to +307
if (data[uuid].error) {
console.warn("Error polling btrfs", uuid, data[uuid].error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 added lines are not executed by any test.

Comment on lines +365 to +366
channel.catch(err => {
throw new Error(err.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 added lines are not executed by any test.

Comment on lines +69 to +72
if (usage.Blocking) {
dialog_open({
Title: cockpit.format(_("$0 is in use"), name),
Body: BlockingMessage(usage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 added lines are not executed by any test.

Title: cockpit.format(_("$0 is in use"), name),
Body: BlockingMessage(usage)
});
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test.

// We don't complain about the rootfs, it's probably
// configured somewhere else, like in the bootloader.
if (m == "/")
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test.


// This is the mount point used for monitoring btrfs filesystems.
if (m.startsWith(BTRFS_TOOL_MOUNT_PATH))
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added line is not executed by any test.

Copy link
Member

@jelly jelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is as good to go, just need to get CI green.

@mvollmer mvollmer merged commit c44f048 into cockpit-project:main Oct 2, 2024
77 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: btrfs volumes that require mounting to perform an action should auto-mount
3 participants