You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a server dies while the GUI client is mid-way through establishing a connection, the GUI sometimes crashes. This seems to be non-deterministic and is most likely a race condition somewhere.
It's a bit hard to debug, since I haven't been able to get a proper traceback.
It appears that a ConnectionReset error is raised in asyncio.windows_events.IocpProactor.recv.finish_recv. This leads to asuncio.Future.set_exception being called (via _OverlappedFuture), however the future's state is already _CANCELLED, so an InvalidStateError is raised and the program terminated.
I did try hacking some print(traceback.print_exc())s into the local copy of asyncio in various places, but that always prints None. Not totally sure why.
As part of debugging this, I stripped the GUI down until the only asyncio tasks (asside from QT) are hooking up some sync_structSubscribers with disconnect_cb hooked up to a trivial function that waits 10s before reconnecting (see below). I can't see any obvious way that my code could be responsible for these symtoms, so I suspect that it's a race condition somewhere in asyncio or quamash. It probably doesn't happen on Linux...
defsubscriber_reconnect_cb(self, server, db):
print("cb")
print(traceback.print_exc())
subscriber, fut=self.subscribers[server][db]
try:
fut=asyncio.ensure_future(
subscriber_reconnect_coro(self, server, db))
exceptExceptionase:
print(e)
self.subscribers[server][db] =subscriber, futasyncdefsubscriber_reconnect_coro(self, server, db):
try:
ifself.win.exit_request.is_set():
fordisplayinself.laser_displays:
display.wake_loop.set()
returnexceptExceptionase:
print(e)
logger.error("No connection to server '{}'".format(server))
for_, displayinself.laser_displays.items():
ifdisplay.server==server:
display.server=""display.wake_loop.set()
server_cfg=self.config["servers"][server]
subscriber, fut=self.subscribers[server][db]
try:
awaitsubscriber.close()
except:
passsubscriber.disconnect_cb=functools.partial(
subscriber_reconnect_cb, self, server, db)
whilenotself.win.exit_request.is_set():
try:
print("connecting")
awaitsubscriber.connect(server_cfg["host"],
server_cfg["notify"])
print("done!")
logger.info("Reconnected to server '{}'".format(server))
breakexcept (ConnectionRefusedError, OSError, ConnectionResetError):
passexcept:
logger.info("could not connect to '{}' retry in 10s..."
.format(server))
awaitasyncio.sleep(10)
print("cb complete!")
forserver, server_cfginself.config["servers"].items():
self.subscribers[server] = {}
#ask the servers to keep us updated with changes to laser settings# (exposures, references, etc)subscriber=Subscriber(
"laser_db",
functools.partial(init_cb, self.laser_db),
functools.partial(self.notifier_cb, "laser_db", server),
disconnect_cb=functools.partial(
subscriber_reconnect_cb, self, server, "laser_db"))
self.subscribers[server]["laser_db"] =subscriber, Nonesubscriber_reconnect_cb(self, server, "laser_db")
# ask the servers to keep us updated with the latest frequency datasubscriber=Subscriber(
"freq_db",
functools.partial(init_cb, self.freq_db),
functools.partial(self.notifier_cb, "freq_db", server),
disconnect_cb=functools.partial(
subscriber_reconnect_cb, self, server, "freq_db"))
self.subscribers[server]["freq_db"] =subscriber, Nonesubscriber_reconnect_cb(self, server, "freq_db")
# ask the servers to keep us updated with the latest osa tracessubscriber=Subscriber(
"osa_db",
functools.partial(init_cb, self.osa_db),
functools.partial(self.notifier_cb, "osa_db", server),
disconnect_cb=functools.partial(
subscriber_reconnect_cb, self, server, "osa_db"))
self.subscribers[server]["osa_db"] =subscriber, Nonesubscriber_reconnect_cb(self, server, "osa_db")
The text was updated successfully, but these errors were encountered:
To further debug this, I uninstalled the conda version of quamash and installed a new version from source. NB the conda version packaged in Artiq (0.5.5) is really quite old.
After doing that, I stopped being able to reproduce this issue. Since it's probably a race, the symptoms going away doesn't necessarily mean that the issue is resolved, but I don't see anything else I can do to debug it, and haven't got any more time to sink in to this now.
If a server dies while the GUI client is mid-way through establishing a connection, the GUI sometimes crashes. This seems to be non-deterministic and is most likely a race condition somewhere.
It's a bit hard to debug, since I haven't been able to get a proper traceback.
It appears that a
ConnectionReset
error is raised inasyncio.windows_events.IocpProactor.recv.finish_recv
. This leads toasuncio.Future.set_exception
being called (via_OverlappedFuture
), however the future's state is already_CANCELLED
, so anInvalidStateError
is raised and the program terminated.I did try hacking some
print(traceback.print_exc())
s into the local copy of asyncio in various places, but that always printsNone
. Not totally sure why.As part of debugging this, I stripped the GUI down until the only
asyncio
tasks (asside from QT) are hooking up somesync_struct
Subscribers
withdisconnect_cb
hooked up to a trivial function that waits 10s before reconnecting (see below). I can't see any obvious way that my code could be responsible for these symtoms, so I suspect that it's a race condition somewhere inasyncio
orquamash
. It probably doesn't happen on Linux...The text was updated successfully, but these errors were encountered: