Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xled sometimes hangs, probably when network is flaky #118

Open
3 tasks
rec opened this issue Dec 30, 2021 · 8 comments
Open
3 tasks

xled sometimes hangs, probably when network is flaky #118

rec opened this issue Dec 30, 2021 · 8 comments
Labels

Comments

@rec
Copy link
Contributor

rec commented Dec 30, 2021

Summary

Usually when there are network issues the driver raises an exception, but sometimes it just hangs forever.

Affected XLED components

  • Command Line Interface (CLI)
  • [ x ] Library
  • Documentation
  • Other

XLED version

Tried with both the most recent pip version, and HEAD here.

[I skipped the device information because I can't get it right now, but I will if necessary.]

Operating system

Darwin bantam.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Sep 16 20:58:47 PDT 2021; root:xnu-6153.141.40.1~1/RELEASE_X86_64 x86_64

Linux raspberrypi 5.10.63-v7+ #1459 SMP Wed Oct 6 16:41:10 BST 2021 armv7l GNU/Linux

Python version

Python 3.6.6, also tried 3.8.x on Mac
On RP, 3.9.1

Steps to reproduce

I've been running this driver 24/7 for over a month, on two different machines.

I thought my network was solid, but it seems there are occasional short outages which I never notice (e.g. when watching Netflix). (In my experience, this is typical of home systems.)

After a day or two, one of three different behaviors seems to happen:

GOOD:

  File "/code/xled/xled/control.py", line 1314, in show_rt_frame
    self.set_rt_frame_socket(frame, 3)
  File "/code/xled/xled/control.py", line 893, in set_rt_frame_socket
    self.udpclient.send(packet)
  File "/code/xled/xled/udp_client.py", line 95, in send
    return self.handle.sendto(message, 0, (self.destination_host, self.port))
OSError: [Errno 101] Network is unreachable

GOOD:

  File "/Users/tom/synthetic/code/xled/xled/control.py", line 1314, in show_rt_frame
    self.set_rt_frame_socket(frame, 3)
  File "/Users/tom/synthetic/code/xled/xled/control.py", line 893, in set_rt_frame_socket
    self.udpclient.send(packet)
  File "/Users/tom/synthetic/code/xled/xled/udp_client.py", line 95, in send
    return self.handle.sendto(message, 0, (self.destination_host, self.port))
OSError: [Errno 65] No route to host

BAD: nothing - it just hangs and the animation stops working, and there is no output.

Additional information

I reran both programs with --verbose and I'll let you know what happens. This should also give me a stack trace when I break out of the hanging program, which was before suppressed by the calling program.

There will probably be more information coming, but I wanted to get all this down to start with to see if this were familiar to you!

Thanks again for an excellent program.

@rec rec added the bug label Dec 30, 2021
@rec
Copy link
Contributor Author

rec commented Dec 31, 2021

It happened again today, but there was no message.

The last message was from yesterday:

30-12-2021:15:25:21,595 DEBUG   [connectionpool.py:272] Resetting dropped connection: 192.168.178.73
30-12-2021:15:25:21,623 DEBUG   [connectionpool.py:452] http://192.168.178.73:80 "POST /xled/v1/led/mode HTTP/1.1" 200 13

And that did in fact work, the lights kept going!

@rec
Copy link
Contributor Author

rec commented Dec 31, 2021

Correction - it emits that message every time it starts, so that was from my previous startup...

@scrool
Copy link
Owner

scrool commented Jan 1, 2022

I reran both programs with --verbose and I'll let you know what happens. This should also give me a stack trace when I break out of the hanging program, which was before suppressed by the calling program.

Hm, xled CLI doesn't have --verbose option so I guess that's something on your end? xled is able to control verbosity of various components separately - --verbosity-cli, --verbosity-discover, --verbosity-control, --verbosity-auth. Please see how those are set and make sure that you turn on debug for those components in your code. In this case equivalent of --verbosity-control might be the most interesting one, unless you also run discover in which case equivalent of --verbosity-discover might help as well.

@Anders-Holst
Copy link
Contributor

Hi @rec
Yes I have experienced it too. (I guess its mainly we two who use these continuous rt effects on a regular basis). My connection was bad to the tree in the garden, and then it happened typialy after less than an hour I bought an extender, and since then it has not happened.

When it had happened, it was impossible to get it going again from the same python shell. The udp connection was "dead", until I restarted python.

I suppose it is something deep inside udp_client, or rather deep inside the socket code of python. If the glitch in communication happens at the wrong place in the code, it might wait indefinitely for a lost acc, or something? (I'm just speculating.) But then there should be reports of this misbehavior from the rest of the Python community.

@rec
Copy link
Contributor Author

rec commented Jan 4, 2022 via email

@Anders-Holst
Copy link
Contributor

Hi again @rec,

Any new crash or hanging on your side? I have been using the MeanderingSequence effect (defined in sequence.py, and currently my favorite of the continuous effects, with its endless sequence of slowly changing gradients in varying directions) now every day since new year, and it has not crashed (since I use the extender, which apparently makes my network stable enough).

The phenomenon that you can't run two instances on the same machine is due to something else. It is because the realtime effect uses the existing udp_client.py, and that code hogs the same port on the local machine that it will communicate with on the strings. Supposedly in case of two-way communication so it can receive the replies. That is overkill for the realtime effects, because there are no replies, its purely one-way. But because of this, when you try to use rt on a second device, the port on the local machine is already occupied.

So I though, as a means to debug the hanging when the network is intermittently unreliable, and at the same time get rid of the problem with the occupied port, I wrote the following super simple replacement for udp_client. It mimics just the function needed for real time effect, and directly calls socket to do the job, without any extras or frills. So if it still hangs within the call to socket.sendto I would argue that the problem is anyway most likely within the python socket code.

To use it, load the code below. Then create your ControlInterface or HighControlInterface as you normally would:
ctr = HihControlInterface(host)
Then insert the simpler udp-client where the real one should have gone:
ctr._udpclient = SimpleUDPClient(7777, ctr.host)
And then use it just like normally.

(Just reflecting that "unreliable" is a relative concept. You say that it can work for some three days in a row before it hangs, and assuming one new frame every second, this makes a quarter million successful calls before one fails... Nevertheless, it should of course never ever hang. At least there should be a fallback, eg skip the failed frame and move to the next one.)

"""
xled.simple_udp
~~~~~~~~~~~~~~~
An even simpler UDP class. Only used for one-way send messages
"""

import socket

class SimpleUDPClient(object):
    def __init__(self, port, host):
        self.port = port
        self.host = host
        self.handle = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)

    def send(self, message):
        return self.handle.sendto(message, 0, (self.host, self.port))

@rec
Copy link
Contributor Author

rec commented Jan 11, 2022 via email

@scrool
Copy link
Owner

scrool commented Mar 3, 2022

This might explain another phenomenon I think I discovered a month ago, which is that you can't run two instances of this program on the same machine even if they are talking to different lighting strings, but I haven't tried to do that in a long time.

This was fixed in #119.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants