recv_into fix for esp32spi socket implementation #168

aaftonbladet · 2023-05-16T15:41:02Z

Resolves #148 and #165

The esp32spi socket implementations of recv and recv_into do not raise an OSError if the read times out, deviating from the behavior in CPython. In the past we accounted for this by checking whether anything was read when calling recv, if not, raise an OSError ourselves. However, in c334c81 we started to utilize recv_into rather than recv, if it was available. The new version does not have the "raise OSError ourselves if nothing was read"-workaround, because it was assumed that esp32spi wouldn't have recv_into and the old version would be used.

Ideally, this would be fixed in esp32spi for both functions. However, I think they'd be hesitant to except such a radical change - it'd likely break projects relying on the old implementation. Hence I believe we should take it upon ourselves to fix it.

Maybe it's also appropriate to entirely drop support for socket implementations that lack recv_into. Our alternative implementation utilizing recv is labled "esp32 fix", but esp32spi sockets have had support for recv_into for more than a year. I did not include this change in my PR, but I think it should be considered.

tekktrik · 2023-05-26T13:33:18Z

After rerunning the CI, it looks like pre-commit wants to reformat the code. You can fix this issue by running pre-commit locally and pushing the changes. You can find information on installing and using pre-commit here: https://learn.adafruit.com/creating-and-sharing-a-circuitpython-library/check-your-code

brentru · 2023-05-30T14:56:42Z

@vladak As the author of c334c81, are you able to take a look?

brentru · 2023-05-30T14:59:54Z

adafruit_minimqtt/adafruit_minimqtt.py

+                # If no bytes are waiting, raise an OSError for good measure.
+                # Some implementations of recv_into do this on their own (like CPython),
+                # but we should be prepared if one doesn't (looking at you esp23spi...)
+                raise OSError(errno.ETIMEDOUT)


@aaftonbladet Could you clarify why this is being raised here instead of within the ESP32SPI implementation on L1141? We already raise this exception on L1147.

Also - if you're intending to replace recv with recv_into, why is this segment of code modified instead of L1143?

The "esp32spi implementation" is just a fallback if the socket object doesn't have recv_into, as old versions of esp32spi didn't have recv_into. Check is on L344. Since recv_into now exists in esp32spi, the regular implementation is used instead of the backwards compatible one. The regular one doesn't check that data was actually read - it relies on the callee raising an OSError. Compare that to the backwards compatible one which was made with esp32 in mind, it raises an OSError, because esp32spi doesn't.

My patch makes the behavior consistent in both implementations, we check ourselves that data was read, if not, throw an OSError as the callee should've.

if you're intending to replace recv with recv_into

recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that.

Ok, now I understand. Thank you for clarifying.

recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that.

Would you be interested in adding this to the PR?

Thinking of this a bit more, I am not convinced this is the way to go. Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of _sock_exact_recv(). Secondly, this seems to trump the library receive timeout. If the socket times out, the library should probably wait up to recv_timeout; this ordering (recv_timeout > socket_timeout) is actually enforced in __init()__.

Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148.

Thank's @vladak for the input. Will wait for @aaftonbladet 's response before moving forward.

Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of _socket_exact_recv().

All callers of _socket_exact_recv() should expect it to throw. As for whether it's an OSError, from what I can see, we filter exceptions only in one place, and that's around L857 in _wait_for_msg(). Depending on whether there's a custom exception class for timeouts in the socket module, we catch that one as opposed to the generic OSError.

With that, a bug could arise if the socket module has its own timeout class. We'd try to catch that one instead of the generic OSError. In comes my patch, where it's possible that an OSError is still thrown, which then wouldn't be caught. Maybe we can check that there's no custom timeout class before throwing the error? If there's a custom timeout class, it's likely the underlying socket will handle timeouts itself rendering my patch redundant anyhow.

And yes, it's possible library consumers would be exposed to a new type of exception. But who's this an issue for? If you call _socket_exact_recv right now and you're not catching exceptions, you'll crash. If you catch only OSError, my patch is using the same exception. If you catch the socket module's timeout exception, you're on a platform where we, after implementing the check I proposed in the previous paragraph, aren't introducing any new behavior.

Secondly, this seems to trump the library receive timeout

The socket implementation is told to use the given receive timeout; search for settimeout in the code. If there's an issue with the receive timeout not being properly set, it already exists and my code won't affect them. Though I must say it is strange. My esp32spi socket, despite the MQTT receive timeout being zero, still times out after one second. Maybe there's bug at play.

Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148.

Right, sorry for not clarifying. The issue lies in us going past L966 because no exception is thrown, and then eventually into the loop at L971. Then we're stuck in that loop for 60 seconds, which is consistent with user reports. To compensate for recv_into() not throwing on esp32spi, we throw the exception ourselves which keeps us from getting stuck in the loop.

When traversing the code submitted to various issues, there is often the following pattern:

try: io.loop() except (MMQTTException) as e: print("MQTTException: \n", e) time.sleep(300) continue

so the OSError will cause trouble for these. Personally, I don't see a reason for the OSError exception to shoot through as it is implementation detail of the MQTT library.

Looking at the loop() implementation, it will cause a problem there too. The while cycle seems to be designed around the receive timeout, yet it is strange how the function deals with timeouts in general. It overrides the socket timeout, which I don't think it should do (it also avoids the socket timeout < receive timeout check). It cycles up to receive timeout and it seems to me that the intention is that the timeout argument is meant as the overall wait time, no matter what socket timeout is set to.

If socket timeout in CPython raises OSError, this will propagate via _wait_for_msg() to loop(), breaking this behavior.

I think that the timeout argument for loop() should override the receive timeout and the function should avoid mingling with the socket timeout.

Someone noticed that too: #68 (comment)

If we go with this change, I think the _wait_for_msg() needs to catch the OSError, to make both if branches uniform (return None on socket timeout):

Adafruit_CircuitPython_MiniMQTT/adafruit_minimqtt/adafruit_minimqtt.py

Lines 1027 to 1040 in 40b9096

# CPython socket module contains a timeout attribute

if hasattr(self._socket_pool, "timeout"):

try:

res = self._sock_exact_recv(1)

except self._socket_pool.timeout:

return None

else: # socketpool, esp32spi

try:

res = self._sock_exact_recv(1)

except OSError as error:

if error.errno in (errno.ETIMEDOUT, errno.EAGAIN):

# raised by a socket timeout if 0 bytes were present

return None

raise MMQTTException from error

vladak · 2023-05-31T19:08:57Z

However, in c334c81 we started to utilize recv_into rather than recv, if it was available.

Before that change, recv_into() was already used (in that if branch); my change merely made sure that the return value of recv_into() was used in order to read the full contents (as specified by the bufsize argument). Also, the if condition itself was not changed, merely the contained block of code.

vladak · 2023-05-31T19:14:35Z

adafruit_minimqtt/adafruit_minimqtt.py

@@ -1119,6 +1119,12 @@ def _sock_exact_recv(self, bufsize: int) -> bytearray:
            rc = bytearray(bufsize)
            mv = memoryview(rc)
            recv_len = self._sock.recv_into(rc, bufsize)
+            if recv_len == 0:


Is there some other case than timeout that could lead to the 0 being returned from recv_into(), regardless of the underlying implementation ? (ESP32, CPython, ...)

Because if we are not sure about that, it might be prudent to augment the recv_len == 0 check to verify this is actually esp32spi.

That depends on the underlying implementation, so I wouldn't know. We could add check that receive_timeout seconds have passed before raising the exception, but that would require us to correctly define the receive timeout. As I mentioned earlier, even though the timeout passed to socket.settimeout is zero, my esp32spi socket waits a second before timing out.

vladak · 2023-05-31T19:16:17Z

adafruit_minimqtt/adafruit_minimqtt.py

@@ -1119,6 +1119,12 @@ def _sock_exact_recv(self, bufsize: int) -> bytearray:
            rc = bytearray(bufsize)
            mv = memoryview(rc)
            recv_len = self._sock.recv_into(rc, bufsize)


There is another call to recv_into() in the while cycle below (L1134). Does it need same treatment ? (if we can agree on the treatment being correct)

I would say no, it doesn't need it. The first check is just to see if there's any data pending in the socket. When we reach the while loop, we already know we have things to read.

aaftonbladet · 2023-06-08T11:29:27Z

Before that change, recv_into() was already used [...]

I might've gotten the exact commit mixed up, it could've been d60ab23. I tested my way back and forth until I found which version it broke in.

vladak · 2023-06-14T19:37:49Z

I might've gotten the exact commit mixed up, it could've been d60ab23. I tested my way back and forth until I found which version it broke in.

I think you'd need to dig even further to history. If I remember correctly, the duplicate definition removal did not have any effect, it merely removed dead code.

alison-gravley · 2023-07-24T05:40:53Z

Is this fixed yet? I am trying to use a Matrix Portal M4 and I can't really update the display if the loop() is blocking all the time, especially if I am not getting messages very often. I messed around with asyncio a little, but it looks like it hangs the whole thing either way.

FoamyGuy · 2023-07-24T12:55:07Z

@alison-gravley I believe if you are using the absolutely latest version of this library, and the absolute latest version of the ESP32SPI library in conjunction with each other the loop() function is no longer blocking indefinitely under those new versions.

My understanding is that the changes from this PR won't necessarily be needed, because we instead made changes inside of ESP32SPI to bring it's behavior more inline with CPython sockets. So this could pototentially be closed at this point unless I've misunderstood some part of it.

I cannot speak to asyncio support, I did a fair amount of testing when preparing to merge / release the latest changes across many different devices and use-cases, but asyncio is something I didn't touch at all during testing, I'm not sure how it interacts with loop() or any other functionality.

alison-gravley · 2023-07-24T14:06:23Z

@alison-gravley I believe if you are using the absolutely latest version of this library, and the absolute latest version of the ESP32SPI library in conjunction with each other the loop() function is no longer blocking indefinitely under those new versions.

My understanding is that the changes from this PR won't necessarily be needed, because we instead made changes inside of ESP32SPI to bring it's behavior more inline with CPython sockets. So this could pototentially be closed at this point unless I've misunderstood some part of it.

I cannot speak to asyncio support, I did a fair amount of testing when preparing to merge / release the latest changes across many different devices and use-cases, but asyncio is something I didn't touch at all during testing, I'm not sure how it interacts with loop() or any other functionality.

I should be on latest release of CircuitPython, 8.2.0. Do I need to change to a bleeding edge? I am not sure where I find the version of esp32spi.

From lib folder:
adafruit_minmqtt v7.4.0

boot_out.txt says:
Adafruit CircuitPython 8.2.0 on 2023-07-05; Adafruit Matrix Portal M4 with samd51j19
Board ID:matrixportal_m4

info_uf2.txt:
UF2 Bootloader v3.15.0 SFHWRO
Model: Matrix Portal M4
Board-ID: SAMD51J19A-MatrixPortal-v0

FoamyGuy · 2023-07-24T14:56:22Z

@alison-gravley Ahh, I forgot the detail that esp32spi is frozen into the build. I think there has not been an update in the core to bump to the newest version so the frozen in one is still on the prior release. Which does make it a little bit trickier, but still do-able.

For now the easiest way to get it working is get the latest release of esp32spi from the assets downloads on this release page: https://github.com/adafruit/Adafruit_CircuitPython_ESP32SPI/releases/tag/6.0.0 get the 8.x mpy zip, download and extract it. Inside find the lib folder, and inside of that should be the adafruit_esp32spi/ directory. Copy that directory to the root of your CIRCUITPY drive (unlike normally when you'd put it inside of lib this needs to be specifically in the root directory to "override" the frozen version from the core build).

So your drive should look something like this:

CIRCUITPY/
    adafruit_esp32spi/
    lib/
        adafruit_minimqtt/
    boot_out.txt
    code.py

Putting the latest esp32spi in the root of CIRCUITPY will make it so that this version gets used instead of the one frozen in. At some point the core one will get updated to the latest and then this process will no longer be necessary, but I think for now it is.

alison-gravley · 2023-07-24T16:29:59Z

@FoamyGuy, thank you so much! I finally have scrolling text and can move on to fun animations.

I actually tried all of this in Arduino first, but ran into a ton of issues between adafruit libraries and changes with esp32 breaking things. I had to do some hacky things to even get the project to compile and run, but I couldn't quite get it to subscribe to the broker without getting an error.

Off-topic here, but I keep hitting a lot of bad luck on when I dig these esp32-s2/s3 boards out. Every time I think it is fun to play with such and such library I run in to a problem and find a recent GitHub post explaining the issue. With both Circuit Python and Arduino. I am just thankful all of the developers have been great and I appreciate all of you for helping your end users who have a mish-mash of capabilities and experience levels.

toddkrein · 2023-08-11T23:04:34Z

Well, three days down the rabbit hole after being bitten by this on my PyPortal. At least it was educational. Pulling the latest esp32spi and putting it at the top level of CircuitPython has fixed the problem. What's the next step to encourage pulling the updated esp32spi into the CircuitPython release itself?

update: 8.2.3 still doesn't have the updated esp32spi library

dwadler · 2023-10-24T22:04:41Z

This was painful.
After moving my esp32-s2 MQTT app to the PYPortal, I spent hours pulling my hair trying to figure out why loop() was hanging, trying the published AdaFruit examples which hung similarly.
I finally found this issue and thanks to the suggested by "Foamguy" above, got it to work. (I was about to start poking around in the mini_mqtt source).
Isn't this going to be broken for anyone trying to run the samples on PYPortal with the current code levels?

toddkrein · 2023-10-24T22:07:40Z

This was painful. After moving my esp32-s2 MQTT app to the PYPortal, I spent hours pulling my hair trying to figure out why loop() was hanging, trying the published AdaFruit examples which hung similarly. I finally found this issue and thanks to the suggested by "Foamguy" above, got it to work. (I was about to start poking around in the mini_mqtt source). Isn't this going to be broken for anyone trying to run the samples on PYPortal with the current code levels?

Wow, you solved in hours... Better than my several days.
Yes, it will be an issues for everyone using this level.

justmobilize · 2024-05-20T22:13:32Z

@aaftonbladet and @dhalbert - I might recommend closing this, and if the esp32spi should raise on a condition like this, a PR should be added there so we don't have special case code in this library

FoamyGuy · 2024-08-25T15:38:06Z

Closing this for now. I believe it should not be an issue any more. I think the frozen in versions were updated a while ago to support the new way of working.

If anyone else runs into this on newer versions we can re-open and dig further again.

if no bytes waiting, raise OSError

3fc4d6d

aaftonbladet changed the title ~~recv_into fix for esp32spi socket implemenetation~~ recv_into fix for esp32spi socket implementation May 23, 2023

User added 2 commits May 29, 2023 13:28

pre-commit formatting

2ec6665

second attempt at pre-commit formatting

45bb577

tekktrik requested a review from brentru May 29, 2023 14:51

brentru requested changes May 30, 2023

View reviewed changes

vladak reviewed May 31, 2023

View reviewed changes

vladak mentioned this pull request Jun 14, 2023

loop() timeout parameter should be absolute #169

Merged

FoamyGuy closed this Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recv_into fix for esp32spi socket implementation #168

recv_into fix for esp32spi socket implementation #168

aaftonbladet commented May 16, 2023

tekktrik commented May 26, 2023

brentru commented May 30, 2023

brentru May 30, 2023

aaftonbladet May 30, 2023

brentru May 30, 2023

vladak Jun 1, 2023

brentru Jun 1, 2023

aaftonbladet Jun 8, 2023

vladak Jun 14, 2023

vladak commented May 31, 2023

vladak May 31, 2023

vladak May 31, 2023

aaftonbladet Jun 8, 2023

vladak May 31, 2023

aaftonbladet Jun 8, 2023 •

edited

Loading

aaftonbladet commented Jun 8, 2023

vladak commented Jun 14, 2023 •

edited

Loading

alison-gravley commented Jul 24, 2023

FoamyGuy commented Jul 24, 2023

alison-gravley commented Jul 24, 2023

FoamyGuy commented Jul 24, 2023

alison-gravley commented Jul 24, 2023

toddkrein commented Aug 11, 2023 •

edited

Loading

dwadler commented Oct 24, 2023 •

edited

Loading

toddkrein commented Oct 24, 2023

justmobilize commented May 20, 2024

FoamyGuy commented Aug 25, 2024

	# CPython socket module contains a timeout attribute
	if hasattr(self._socket_pool, "timeout"):
	try:
	res = self._sock_exact_recv(1)
	except self._socket_pool.timeout:
	return None
	else: # socketpool, esp32spi
	try:
	res = self._sock_exact_recv(1)
	except OSError as error:
	if error.errno in (errno.ETIMEDOUT, errno.EAGAIN):
	# raised by a socket timeout if 0 bytes were present
	return None
	raise MMQTTException from error

recv_into fix for esp32spi socket implementation #168

recv_into fix for esp32spi socket implementation #168

Conversation

aaftonbladet commented May 16, 2023

tekktrik commented May 26, 2023

brentru commented May 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladak commented May 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaftonbladet Jun 8, 2023 • edited Loading

Choose a reason for hiding this comment

aaftonbladet commented Jun 8, 2023

vladak commented Jun 14, 2023 • edited Loading

alison-gravley commented Jul 24, 2023

FoamyGuy commented Jul 24, 2023

alison-gravley commented Jul 24, 2023

FoamyGuy commented Jul 24, 2023

alison-gravley commented Jul 24, 2023

toddkrein commented Aug 11, 2023 • edited Loading

dwadler commented Oct 24, 2023 • edited Loading

toddkrein commented Oct 24, 2023

justmobilize commented May 20, 2024

FoamyGuy commented Aug 25, 2024

aaftonbladet Jun 8, 2023 •

edited

Loading

vladak commented Jun 14, 2023 •

edited

Loading

toddkrein commented Aug 11, 2023 •

edited

Loading

dwadler commented Oct 24, 2023 •

edited

Loading