You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
--
I've reproduced this issue on a test machine. What I've observed is that sending the "device_del" QMP command triggers a RTAS event which is managed in the Guest OS by running drmgr. This one requests the removal of the PCI device following these steps:
Write '1' to /sys/devices/pci0000:00/0000:00:XX.0/remove
Wait for "/sys/devices/pci0000:00/0000:00:XX.0" to disappear.
Remove the device-tree node.
Isolate the device via RTAS.
Power-off the device, again via RTAS.
The problem I've observed is that while the hotplug code waits for virtio_blk to be ready to be removed, drmgr has a 60 seconds timeout at step 2. So, if stopping and releasing the PCI device takes too long, something that can easily happen in this test as "dd" is asynchronously writing to the disk, which puts a lot of pressure on the page cache and the virtqueue, drmgr stops waiting but proceeds with steps 3, 4 and 5 anyway:
(...)
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
timeout while quiescing device at /sys/devices/pci0000:00/0000:00:04.0
Removing device-tree node /proc/device-tree/pci@800000020000000/scsi@4
is calling rtas_set_indicator(ISOLATE index 0x40000020)
is calling set_power(POWER_OFF index 0x40000020, power_domain 0xffffffff)
This happens because the return value of common_pci.c:pci_remove_device() is ignored by drslot_chrp_pci.c:remove_work().
I think this should be fixed in drmgr, but honoring pci_remove_device return value and making it to fail gracefully.
One of our developer has reported:
--
I've reproduced this issue on a test machine. What I've observed is that sending the "device_del" QMP command triggers a RTAS event which is managed in the Guest OS by running drmgr. This one requests the removal of the PCI device following these steps:
The problem I've observed is that while the hotplug code waits for virtio_blk to be ready to be removed, drmgr has a 60 seconds timeout at step 2. So, if stopping and releasing the PCI device takes too long, something that can easily happen in this test as "dd" is asynchronously writing to the disk, which puts a lot of pressure on the page cache and the virtqueue, drmgr stops waiting but proceeds with steps 3, 4 and 5 anyway:
(...)
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
waiting for PCI device driver to quiesce device at /sys/devices/pci0000:00/0000:00:04.0
timeout while quiescing device at /sys/devices/pci0000:00/0000:00:04.0
Removing device-tree node /proc/device-tree/pci@800000020000000/scsi@4
is calling rtas_set_indicator(ISOLATE index 0x40000020)
is calling set_power(POWER_OFF index 0x40000020, power_domain 0xffffffff)
This happens because the return value of common_pci.c:pci_remove_device() is ignored by drslot_chrp_pci.c:remove_work().
I think this should be fixed in drmgr, but honoring pci_remove_device return value and making it to fail gracefully.
For more info please have a look at https://bugzilla.redhat.com/show_bug.cgi?id=1458187
The text was updated successfully, but these errors were encountered: