You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had an issue on 3 out of 4 hosts in an ovirt cluster (4.5.4-1.el8) where one VM is stuck in unresponsive state. It cannot be powered down nor restarted and as long as it's qemu process is running I can't list processes on that host. VM is unreachable through network and ovirt's VNC console. The only way to resolve the issue is to restart host from ovirt webUI (or kill qemu process).
I can see in vdsm logs such entries:
2023-05-05 21:27:52,848+0200 ERROR (qgapoller/1) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7fe08c0d9630>> operation failed (periodic:187)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/periodic.py", line 185, in call
self._func()
File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 476, in _poller
vm_id, self._qga_call_get_vcpus(vm_obj))
File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 797, in _qga_call_get_vcpus
if 'online' in vcpus:
TypeError: argument of type 'NoneType' is not iterable
And then eventually leads to:
2023-05-05 21:45:17,709+0200 ERROR (vm/220746d4) [virt.vm] (vmId='220746d4-56a5-40cc-8633-1285c167c4fe') Failed to update CPU set of the VM to match shared pool (cpumanagement:121)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 104, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 114, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 78, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2303, in pinVcpu
raise libvirtError('virDomainPinVcpu() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/cpumanagement.py", line 108, in _assign_shared
vm.pin_vcpu(vcpu, cpuset)
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 6306, in pin_vcpu
self._dom.pinVcpu(vcpu, cpuset)
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 112, in f
raise toe
vdsm.virt.virdomain.TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)
This causes CPU to stuck on qemu process. If I forcibly kill the process everything gets back to normal, but ovirt reports vm's state as "unresponsive" or "powering down" if I try to shut it down from webUI.
Hosts are connected via glusterfs FUSE which runs on separate hosts (3 hosts with replica 3 and jbod setup with 6 nvme disks).
All hosts (hypervisors and gluster) use CentOS 8 Stream.
Version-Release number of selected component:
4.50.3.4-1.el8.x86_64
The text was updated successfully, but these errors were encountered:
Hi @mz-pdm, I will update to Vdsm 4.50.5 during next update window and see if the error message goes away.
For the crashes - they seems unrelated to vdsm - after migrating all hypervisor hosts to Rocky8 the issue did not occur for 4 consecutive days.
We had an issue on 3 out of 4 hosts in an ovirt cluster (4.5.4-1.el8) where one VM is stuck in unresponsive state. It cannot be powered down nor restarted and as long as it's qemu process is running I can't list processes on that host. VM is unreachable through network and ovirt's VNC console. The only way to resolve the issue is to restart host from ovirt webUI (or kill qemu process).
I can see in vdsm logs such entries:
2023-05-05 21:27:52,848+0200 ERROR (qgapoller/1) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7fe08c0d9630>> operation failed (periodic:187)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/periodic.py", line 185, in call
self._func()
File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 476, in _poller
vm_id, self._qga_call_get_vcpus(vm_obj))
File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 797, in _qga_call_get_vcpus
if 'online' in vcpus:
TypeError: argument of type 'NoneType' is not iterable
And then eventually leads to:
2023-05-05 21:45:17,709+0200 ERROR (vm/220746d4) [virt.vm] (vmId='220746d4-56a5-40cc-8633-1285c167c4fe') Failed to update CPU set of the VM to match shared pool (cpumanagement:121)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 104, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 114, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 78, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2303, in pinVcpu
raise libvirtError('virDomainPinVcpu() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/cpumanagement.py", line 108, in _assign_shared
vm.pin_vcpu(vcpu, cpuset)
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 6306, in pin_vcpu
self._dom.pinVcpu(vcpu, cpuset)
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 112, in f
raise toe
vdsm.virt.virdomain.TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)
This causes CPU to stuck on qemu process. If I forcibly kill the process everything gets back to normal, but ovirt reports vm's state as "unresponsive" or "powering down" if I try to shut it down from webUI.
Hosts are connected via glusterfs FUSE which runs on separate hosts (3 hosts with replica 3 and jbod setup with 6 nvme disks).
All hosts (hypervisors and gluster) use CentOS 8 Stream.
Version-Release number of selected component:
4.50.3.4-1.el8.x86_64
The text was updated successfully, but these errors were encountered: