Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM stuck in unresponsive state and prohibits listing processes on host #389

Open
ddrazyk opened this issue May 8, 2023 · 2 comments
Open

Comments

@ddrazyk
Copy link

ddrazyk commented May 8, 2023

We had an issue on 3 out of 4 hosts in an ovirt cluster (4.5.4-1.el8) where one VM is stuck in unresponsive state. It cannot be powered down nor restarted and as long as it's qemu process is running I can't list processes on that host. VM is unreachable through network and ovirt's VNC console. The only way to resolve the issue is to restart host from ovirt webUI (or kill qemu process).
I can see in vdsm logs such entries:

2023-05-05 21:27:52,848+0200 ERROR (qgapoller/1) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7fe08c0d9630>> operation failed (periodic:187)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/periodic.py", line 185, in call
self._func()
File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 476, in _poller
vm_id, self._qga_call_get_vcpus(vm_obj))
File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 797, in _qga_call_get_vcpus
if 'online' in vcpus:
TypeError: argument of type 'NoneType' is not iterable

And then eventually leads to:
2023-05-05 21:45:17,709+0200 ERROR (vm/220746d4) [virt.vm] (vmId='220746d4-56a5-40cc-8633-1285c167c4fe') Failed to update CPU set of the VM to match shared pool (cpumanagement:121)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 104, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 114, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 78, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2303, in pinVcpu
raise libvirtError('virDomainPinVcpu() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/cpumanagement.py", line 108, in _assign_shared
vm.pin_vcpu(vcpu, cpuset)
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 6306, in pin_vcpu
self._dom.pinVcpu(vcpu, cpuset)
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 112, in f
raise toe
vdsm.virt.virdomain.TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)

This causes CPU to stuck on qemu process. If I forcibly kill the process everything gets back to normal, but ovirt reports vm's state as "unresponsive" or "powering down" if I try to shut it down from webUI.
Hosts are connected via glusterfs FUSE which runs on separate hosts (3 hosts with replica 3 and jbod setup with 6 nvme disks).
All hosts (hypervisors and gluster) use CentOS 8 Stream.

Version-Release number of selected component:
4.50.3.4-1.el8.x86_64

@mz-pdm
Copy link
Member

mz-pdm commented May 15, 2023

As for the first traceback, the issue is fixed in Vdsm 4.50.5. It may be worth to upgrade Vdsm and see whether it fixes the problem.

@ddrazyk
Copy link
Author

ddrazyk commented May 15, 2023

Hi @mz-pdm, I will update to Vdsm 4.50.5 during next update window and see if the error message goes away.
For the crashes - they seems unrelated to vdsm - after migrating all hypervisor hosts to Rocky8 the issue did not occur for 4 consecutive days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants