Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify-unique-fs-label: Run blkid with a clean cache to avoid stale devices #1288

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

HuijingHei
Copy link
Member

@HuijingHei HuijingHei commented Aug 31, 2023

Boot VM failed with following logs, the reproduce rate is 2/4.

systemd[1]: Starting CoreOS Ensure Unique Boot Filesystem...
rdcore[1252]: blkid: error: /dev/sr0: No such file or directory
rdcore[1252]: Error: "blkid" "-p" "/dev/sdb1" "/dev/sr0"
"/dev/sda4" "/dev/sda2" "/dev/sda3" "/dev/sda1" failed with exit
status: 2
systemd[1]: coreos-ignition-unique-boot.service: Main process
exited, code=exited, status=1/FAILURE
coreos-ignition-unique-boot.service: Failed with result 'exit-code'.
Failed to start CoreOS Ensure Unique Boot Filesystem.
systemd[1]: coreos-ignition-unique-boot.service: Triggering
OnFailure= dependencies.

Seems blkid collecting old devices from cache which no longer
exist, should run with a clean cache before gathering the list of
devices.

Fixes https://issues.redhat.com/browse/OCPBUGS-17643


Do testing with this PR, hit the issue occasionally with low rate,
seems can not fix the issue totally.
Also need openshift/os#1350

HuijingHei added a commit to HuijingHei/os that referenced this pull request Aug 31, 2023
@HuijingHei HuijingHei force-pushed the blkid-refresh-cache branch from 40f09df to 8826ad5 Compare August 31, 2023 12:26
@cgwalters
Copy link
Member

There's something important here to note, which I believe this is a regression from
50e9034
That's why we're only seeing it in relatively recent builds.

Now...I am not quite sure I understand how this would fix the race. As I understand it the problem is basically that because the above commit started bypassing the cache, if the device is removed at any point in this function we can start failing.

Is it possible you just got lucky in testing so far?

How about we skip probing any devices that are removable. (see e.g. lsblk -o NAME,RM for the human-readable output, and this is the rm property in JSON).

BTW, I suspect a reliable and faster way to test this is to write a small systemd unit (bash script e.g.) that runs in the initramfs that creates and removes loopback devices that are mounted as isofs in a tight loop, then it can be trivially tested in qemu too.

src/blockdev.rs Outdated
Comment on lines 888 to 890
// Refresh the cache to avoid collecting old devices which no longer
// exist.
runcmd!("blkid", "-s", "none")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just change the blkid -o device call below to blkid --cache-file /dev/null -o device and add a comment?

This would be if we didn't end up changing the strategy like Colin mentions in #1288 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I thought if we refresh the cache, then it would be better for later services which call blkid (to read from new catch), but agree it might have risk, I am OK with the change. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on and will do more testing to confirm if the issue is fixed totally.

@dustymabe
Copy link
Member

How about we skip probing any devices that are removable. (see e.g. lsblk -o NAME,RM for the human-readable output, and this is the rm property in JSON).

We did previously use lsblk for some things here but we moved away from it. Not sure if the reason we moved away still applies or not: #813

@cgwalters
Copy link
Member

We did previously use lsblk for some things here but we moved away from it. Not sure if the reason we moved away still applies or not: #813

Ah fun, right. Well, one approach is to strip the leading /dev, and then just directly look for /sys/class/block/$device/removable. Or actually, I think we could stop using blkid here and look at /sys/class/block directly...

HuijingHei added a commit to HuijingHei/os that referenced this pull request Sep 1, 2023
This reverts commit e650006.
The PR to fix the issue is coreos/coreos-installer#1288

Sorry for the confusion, and should do more testing before megerd.
HuijingHei added a commit to HuijingHei/os that referenced this pull request Sep 1, 2023
This reverts commit e650006.
The PR to fix the issue is coreos/coreos-installer#1288

Sorry for the confusion, and should do more testing before merged.
@HuijingHei HuijingHei changed the title rdcore: refresh the cache to avoid collecting old devices which no longer exist rdcore: run blkid with a clean cache to avoid collecting old devices which no longer exist Sep 1, 2023
src/blockdev.rs Outdated Show resolved Hide resolved
@HuijingHei
Copy link
Member Author

Is it possible you just got lucky in testing so far?

Yes, seems this can fix the issue, but needs more testing from OCP to provide more confidence.
Do simple testing with the new change and revert PR , boot 10 vms with Standard_DC16ads_v5 CVM, all are able to boot.
Without the patch, the produce rate is 1/5.

BTW, I suspect a reliable and faster way to test this is to write a small systemd unit (bash script e.g.) that runs in the initramfs that creates and removes loopback devices that are mounted as isofs in a tight loop, then it can be trivially tested in qemu too.

SGTM, will look more about this.

@HuijingHei
Copy link
Member Author

We did previously use lsblk for some things here but we moved away from it. Not sure if the reason we moved away still applies or not: #813

Ah fun, right. Well, one approach is to strip the leading /dev, and then just directly look for /sys/class/block/$device/removable. Or actually, I think we could stop using blkid here and look at /sys/class/block directly...

Do you mean to skip the device that has removable and value is 1 ?
Seems disk partions do not have removable, only disk like vda/vdb has file removable and value is 0

$ ls /sys/class/block/
sr0  vda  vda1	vda2  vda3  vda4

$ ls /sys/class/block/*/removable
/sys/class/block/sr0/removable	/sys/class/block/vda/removable

[cloud-user@citest-1 ~]$ cat /sys/class/block/vda/removable
0
[cloud-user@citest-1 ~]$ cat /sys/class/block/sr0/removable
1

@travier
Copy link
Member

travier commented Sep 4, 2023

What's the difference between -p and -c /dev/null? Why not use -p everywhere? (or the reverse)

@HuijingHei
Copy link
Member Author

Testing result with this patch, installed OCP 3 times:

  • 1st time failed, 2 master nodes hit the same issue.
  • the other 2 times, all nodes are managed to boot.

Seems this can not fix the issue totally, will hit the issue occasionally with low rate.

But I still think this PR is not useless, as this can prevent to collect stale devices from cache. WDYT?


Have a discussion with Azure QE, got info that the /dev/sr0 will be removed by platform when provision successfully.

Do testing with Before=ignition-fetch.service in rhcos-afterburn-checkin.service, got expected error that ignition failed to find /dev/sr0 after checkin finished about 1 second. (Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1980679)

[    9.881886] systemd[1]: Finished Afterburn (Check In - from the initramfs).
...
[   10.732530] ignition[989]: failed to open config device: open /dev/sr0: no such file or directory

There is slightly race condition for any services to get /dev/sr0 that running after checkin, but not 100%, so we need to order coreos-ignition-unique-boot before checkin service, instead of after.
See openshift/os#1350

@travier
Copy link
Member

travier commented Sep 4, 2023

While openshift/os#1350 LGTM, this PR seems reasonable as well so we should consider doing both

@HuijingHei
Copy link
Member Author

What's the difference between -p and -c /dev/null?

According to blkid man

  • -c /dev/null means: start with a clean cache
  • -p <device> means: Switch to low-level superblock probing mode (bypassing the cache).

They both can get new block device attributes as bypass the cache, for the difference, I think -p need device as parameter that we do not know firstly.

Why not use -p everywhere? (or the reverse)

Seems we are using -p to avoid the cache, see 50e9034 and coreos/fedora-coreos-config#2184


Let me know if I misunderstood your meaning.

@HuijingHei HuijingHei marked this pull request as ready for review September 5, 2023 07:43
avoid stale devices

Boot VM failed with following logs, the reproduce rate is 2/4.
```
systemd[1]: Starting CoreOS Ensure Unique Boot Filesystem...
rdcore[1252]: blkid: error: /dev/sr0: No such file or directory
rdcore[1252]: Error: "blkid" "-p" "/dev/sdb1" "/dev/sr0"
"/dev/sda4" "/dev/sda2" "/dev/sda3" "/dev/sda1" failed with exit
status: 2
systemd[1]: coreos-ignition-unique-boot.service: Main process
exited, code=exited, status=1/FAILURE
coreos-ignition-unique-boot.service: Failed with result 'exit-code'.
Failed to start CoreOS Ensure Unique Boot Filesystem.
systemd[1]: coreos-ignition-unique-boot.service: Triggering
OnFailure= dependencies.
```
Seems `blkid` collecting old devices from cache which no longer
exist, should run with a clean cache before gathering the list of
devices.

Fixes https://issues.redhat.com/browse/OCPBUGS-17643
@HuijingHei HuijingHei changed the title rdcore: run blkid with a clean cache to avoid collecting old devices which no longer exist verify-unique-fs-label: Run blkid with a clean cache to avoid stale devices Sep 5, 2023
@cgwalters
Copy link
Member

I think this still makes sense to do at some point, but it seems not urgent as the systemd ordering is a stronger fix.

@HuijingHei HuijingHei merged commit e0690cc into coreos:main Sep 5, 2023
13 checks passed
@HuijingHei HuijingHei deleted the blkid-refresh-cache branch September 5, 2023 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants