Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recv_into fix for esp32spi socket implementation #168

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions adafruit_minimqtt/adafruit_minimqtt.py
Original file line number Diff line number Diff line change
Expand Up @@ -1119,6 +1119,12 @@ def _sock_exact_recv(self, bufsize: int) -> bytearray:
rc = bytearray(bufsize)
mv = memoryview(rc)
recv_len = self._sock.recv_into(rc, bufsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another call to recv_into() in the while cycle below (L1134). Does it need same treatment ? (if we can agree on the treatment being correct)

Copy link
Author

@aaftonbladet aaftonbladet Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no, it doesn't need it. The first check is just to see if there's any data pending in the socket. When we reach the while loop, we already know we have things to read.

if recv_len == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other case than timeout that could lead to the 0 being returned from recv_into(), regardless of the underlying implementation ? (ESP32, CPython, ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we are not sure about that, it might be prudent to augment the recv_len == 0 check to verify this is actually esp32spi.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on the underlying implementation, so I wouldn't know. We could add check that receive_timeout seconds have passed before raising the exception, but that would require us to correctly define the receive timeout. As I mentioned earlier, even though the timeout passed to socket.settimeout is zero, my esp32spi socket waits a second before timing out.

self.logger.debug("_sock_exact_recv timeout")
# If no bytes are waiting, raise an OSError for good measure.
# Some implementations of recv_into do this on their own (like CPython),
# but we should be prepared if one doesn't (looking at you esp23spi...)
raise OSError(errno.ETIMEDOUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaftonbladet Could you clarify why this is being raised here instead of within the ESP32SPI implementation on L1141? We already raise this exception on L1147.

Also - if you're intending to replace recv with recv_into, why is this segment of code modified instead of L1143?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "esp32spi implementation" is just a fallback if the socket object doesn't have recv_into, as old versions of esp32spi didn't have recv_into. Check is on L344. Since recv_into now exists in esp32spi, the regular implementation is used instead of the backwards compatible one. The regular one doesn't check that data was actually read - it relies on the callee raising an OSError. Compare that to the backwards compatible one which was made with esp32 in mind, it raises an OSError, because esp32spi doesn't.

My patch makes the behavior consistent in both implementations, we check ourselves that data was read, if not, throw an OSError as the callee should've.

if you're intending to replace recv with recv_into

recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I understand. Thank you for clarifying.

recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that.

Would you be interested in adding this to the PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of this a bit more, I am not convinced this is the way to go. Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of _sock_exact_recv(). Secondly, this seems to trump the library receive timeout. If the socket times out, the library should probably wait up to recv_timeout; this ordering (recv_timeout > socket_timeout) is actually enforced in __init()__.

Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank's @vladak for the input. Will wait for @aaftonbladet 's response before moving forward.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of _socket_exact_recv().

All callers of _socket_exact_recv() should expect it to throw. As for whether it's an OSError, from what I can see, we filter exceptions only in one place, and that's around L857 in _wait_for_msg(). Depending on whether there's a custom exception class for timeouts in the socket module, we catch that one as opposed to the generic OSError.

With that, a bug could arise if the socket module has its own timeout class. We'd try to catch that one instead of the generic OSError. In comes my patch, where it's possible that an OSError is still thrown, which then wouldn't be caught. Maybe we can check that there's no custom timeout class before throwing the error? If there's a custom timeout class, it's likely the underlying socket will handle timeouts itself rendering my patch redundant anyhow.

And yes, it's possible library consumers would be exposed to a new type of exception. But who's this an issue for? If you call _socket_exact_recv right now and you're not catching exceptions, you'll crash. If you catch only OSError, my patch is using the same exception. If you catch the socket module's timeout exception, you're on a platform where we, after implementing the check I proposed in the previous paragraph, aren't introducing any new behavior.

Secondly, this seems to trump the library receive timeout

The socket implementation is told to use the given receive timeout; search for settimeout in the code. If there's an issue with the receive timeout not being properly set, it already exists and my code won't affect them. Though I must say it is strange. My esp32spi socket, despite the MQTT receive timeout being zero, still times out after one second. Maybe there's bug at play.

Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148.

Right, sorry for not clarifying. The issue lies in us going past L966 because no exception is thrown, and then eventually into the loop at L971. Then we're stuck in that loop for 60 seconds, which is consistent with user reports. To compensate for recv_into() not throwing on esp32spi, we throw the exception ourselves which keeps us from getting stuck in the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When traversing the code submitted to various issues, there is often the following pattern:

    try:
        io.loop()
    except (MMQTTException) as e:
        print("MQTTException: \n", e)
        time.sleep(300)
        continue

so the OSError will cause trouble for these. Personally, I don't see a reason for the OSError exception to shoot through as it is implementation detail of the MQTT library.

Looking at the loop() implementation, it will cause a problem there too. The while cycle seems to be designed around the receive timeout, yet it is strange how the function deals with timeouts in general. It overrides the socket timeout, which I don't think it should do (it also avoids the socket timeout < receive timeout check). It cycles up to receive timeout and it seems to me that the intention is that the timeout argument is meant as the overall wait time, no matter what socket timeout is set to.

If socket timeout in CPython raises OSError, this will propagate via _wait_for_msg() to loop(), breaking this behavior.

I think that the timeout argument for loop() should override the receive timeout and the function should avoid mingling with the socket timeout.

Someone noticed that too: #68 (comment)

If we go with this change, I think the _wait_for_msg() needs to catch the OSError, to make both if branches uniform (return None on socket timeout):

# CPython socket module contains a timeout attribute
if hasattr(self._socket_pool, "timeout"):
try:
res = self._sock_exact_recv(1)
except self._socket_pool.timeout:
return None
else: # socketpool, esp32spi
try:
res = self._sock_exact_recv(1)
except OSError as error:
if error.errno in (errno.ETIMEDOUT, errno.EAGAIN):
# raised by a socket timeout if 0 bytes were present
return None
raise MMQTTException from error

to_read = bufsize - recv_len
if to_read < 0:
raise MMQTTException(f"negative number of bytes to read: {to_read}")
Expand Down