-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xled sometimes hangs, probably when network is flaky #118
Comments
It happened again today, but there was no message. The last message was from yesterday:
And that did in fact work, the lights kept going! |
Correction - it emits that message every time it starts, so that was from my previous startup... |
Hm, |
Hi @rec When it had happened, it was impossible to get it going again from the same python shell. The udp connection was "dead", until I restarted python. I suppose it is something deep inside udp_client, or rather deep inside the socket code of python. If the glitch in communication happens at the wrong place in the code, it might wait indefinitely for a lost acc, or something? (I'm just speculating.) But then there should be reports of this misbehavior from the rest of the Python community. |
Sorry for the delay, I was on an internet detox!
I'll set these settings tonight and give it another try. I do use
discovery, so I'll turn 'em all on.
It's not a problem in Python itself, because as you say we would have heard
about it! Also, UDP connections are the world's simplest thing, simpler
than TCP/IP.
My thoughts are turning toward ZMQ's Python implementation as the possible
candidate. (This might explain another phenomenon I think I discovered a
month ago, which is that you can't run two instances of this program on the
same machine even if they are talking to different lighting strings, but I
haven't tried to do that in a long time.)
I'll keep you posted, and Happy New Year!
…On Sat, Jan 1, 2022 at 4:25 PM Anders-Holst ***@***.***> wrote:
Hi @rec <https://github.com/rec>
Yes I have experienced it too. (I guess its mainly we two who use these
continuous rt effects on a regular basis). My connection was bad to the
tree in the garden, and then it happened typialy after less than an hour I
bought an extender, and since then it has not happened.
When it had happened, it was impossible to get it going again from the
same python shell. The udp connection was "dead", until I restarted python.
I suppose it is something deep inside udp_client, or rather deep inside
the socket code of python. If the glitch in communication happens at the
wrong place in the code, it might wait indefinitely for a lost acc, or
something? (I'm just speculating.) But then there should be reports of this
misbehavior from the rest of the Python community.
—
Reply to this email directly, view it on GitHub
<#118 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB53MVQJEWAOSUFFVJHSDLUT4MHDANCNFSM5K7Z5HZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
/t
PGP Key: ***@***.***
*https://tom.ritchford.com <https://tom.ritchford.com>*
*https://tom.swirly.com <https://tom.swirly.com>*
|
Hi again @rec, Any new crash or hanging on your side? I have been using the MeanderingSequence effect (defined in sequence.py, and currently my favorite of the continuous effects, with its endless sequence of slowly changing gradients in varying directions) now every day since new year, and it has not crashed (since I use the extender, which apparently makes my network stable enough). The phenomenon that you can't run two instances on the same machine is due to something else. It is because the realtime effect uses the existing udp_client.py, and that code hogs the same port on the local machine that it will communicate with on the strings. Supposedly in case of two-way communication so it can receive the replies. That is overkill for the realtime effects, because there are no replies, its purely one-way. But because of this, when you try to use rt on a second device, the port on the local machine is already occupied. So I though, as a means to debug the hanging when the network is intermittently unreliable, and at the same time get rid of the problem with the occupied port, I wrote the following super simple replacement for udp_client. It mimics just the function needed for real time effect, and directly calls To use it, load the code below. Then create your ControlInterface or HighControlInterface as you normally would: (Just reflecting that "unreliable" is a relative concept. You say that it can work for some three days in a row before it hangs, and assuming one new frame every second, this makes a quarter million successful calls before one fails... Nevertheless, it should of course never ever hang. At least there should be a fallback, eg skip the failed frame and move to the next one.)
|
Hello!
I unfortunately just haven't had the time to even look at this.
Of course, your solution to "two instances" has to be the right one. Well
spotted!
I like the simple client much better, to be honest, as it makes life
simpler. I'll probably add a timeout as well...
This never falls off my radar so even if you don't hear from me, I haven't
forgotten, but also don't hesitate to remind me.
…On Mon, Jan 10, 2022 at 11:28 PM Anders-Holst ***@***.***> wrote:
Hi again @rec <https://github.com/rec>,
Any new crash or hanging on your side? I have been using the
MeanderingSequence effect (defined in sequence.py, and currently my
favorite of the continuous effects, with its endless sequence of slowly
changing gradients in varying directions) now every day since new year, and
it has not crashed (since I use the extender, which apparently makes my
network stable enough).
The phenomenon that you can't run two instances on the same machine is due
to something else. It is because the realtime effect uses the existing
udp_client.py, and that code hogs the same port on the local machine that
it will communicate with on the strings. Supposedly in case of two-way
communication so it can receive the replies. That is overkill for the
realtime effects, because there are no replies, its purely one-way. But
because of this, when you try to use rt on a second device, the port on the
local machine is already occupied.
So I though, as a means to debug the hanging when the network is
intermittently unreliable, and at the same time get rid of the problem with
the occupied port, I wrote the following super simple replacement for
udp_client. It mimics just the function needed for real time effect, and
directly calls socket to do the job, without any extras or frills. So if
it still hangs within the call to socket.sendto I would argue that the
problem is anyway most likely within the python socket code.
To use it, load the code below. Then create your ControlInterface or
HighControlInterface as you normally would:
ctr = HihControlInterface(host)
Then insert the simpler udp-client where the real one should have gone:
ctr._udpclient = SimpleUDPClient(7777, ctr.host)
And then use it just like normally.
(Just reflecting that "unreliable" is a relative concept. You say that it
can work for some three days in a row before it hangs, and assuming one new
frame every second, this makes a quarter million successful calls before
one fails... Nevertheless, it should of course never ever hang. At least
there should be a fallback, eg skip the failed frame and move to the next
one.)
"""
xled.simple_udp
~~~~~~~~~~~~~~~
An even simpler UDP class. Only used for one-way send messages
"""
import socket
class SimpleUDPClient(object):
def __init__(self, port, host):
self.port = port
self.host = host
self.handle = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
def send(self, message):
return self.handle.sendto(message, 0, (self.host, self.port))
—
Reply to this email directly, view it on GitHub
<#118 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB53MXEI4G5QWJDNBQQ46DUVNMQXANCNFSM5K7Z5HZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
/t
PGP Key: ***@***.***
*https://tom.ritchford.com <https://tom.ritchford.com>*
*https://tom.swirly.com <https://tom.swirly.com>*
|
This was fixed in #119. |
Summary
Usually when there are network issues the driver raises an exception, but sometimes it just hangs forever.
Affected XLED components
XLED version
Tried with both the most recent pip version, and HEAD here.
[I skipped the device information because I can't get it right now, but I will if necessary.]
Operating system
Darwin bantam.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Sep 16 20:58:47 PDT 2021; root:xnu-6153.141.40.1~1/RELEASE_X86_64 x86_64
Linux raspberrypi 5.10.63-v7+ #1459 SMP Wed Oct 6 16:41:10 BST 2021 armv7l GNU/Linux
Python version
Python 3.6.6, also tried 3.8.x on Mac
On RP, 3.9.1
Steps to reproduce
I've been running this driver 24/7 for over a month, on two different machines.
I thought my network was solid, but it seems there are occasional short outages which I never notice (e.g. when watching Netflix). (In my experience, this is typical of home systems.)
After a day or two, one of three different behaviors seems to happen:
GOOD:
GOOD:
BAD: nothing - it just hangs and the animation stops working, and there is no output.
Additional information
I reran both programs with
--verbose
and I'll let you know what happens. This should also give me a stack trace when I break out of the hanging program, which was before suppressed by the calling program.There will probably be more information coming, but I wanted to get all this down to start with to see if this were familiar to you!
Thanks again for an excellent program.
The text was updated successfully, but these errors were encountered: