-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recv_into fix for esp32spi socket implementation #168
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1119,6 +1119,12 @@ def _sock_exact_recv(self, bufsize: int) -> bytearray: | |||||||||||||||||||||||||||||
rc = bytearray(bufsize) | ||||||||||||||||||||||||||||||
mv = memoryview(rc) | ||||||||||||||||||||||||||||||
recv_len = self._sock.recv_into(rc, bufsize) | ||||||||||||||||||||||||||||||
if recv_len == 0: | ||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there some other case than timeout that could lead to the 0 being returned from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because if we are not sure about that, it might be prudent to augment the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That depends on the underlying implementation, so I wouldn't know. We could add check that receive_timeout seconds have passed before raising the exception, but that would require us to correctly define the receive timeout. As I mentioned earlier, even though the timeout passed to socket.settimeout is zero, my esp32spi socket waits a second before timing out. |
||||||||||||||||||||||||||||||
self.logger.debug("_sock_exact_recv timeout") | ||||||||||||||||||||||||||||||
# If no bytes are waiting, raise an OSError for good measure. | ||||||||||||||||||||||||||||||
# Some implementations of recv_into do this on their own (like CPython), | ||||||||||||||||||||||||||||||
# but we should be prepared if one doesn't (looking at you esp23spi...) | ||||||||||||||||||||||||||||||
raise OSError(errno.ETIMEDOUT) | ||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aaftonbladet Could you clarify why this is being raised here instead of within the ESP32SPI implementation on L1141? We already raise this exception on L1147. Also - if you're intending to replace recv with recv_into, why is this segment of code modified instead of L1143? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "esp32spi implementation" is just a fallback if the socket object doesn't have recv_into, as old versions of esp32spi didn't have recv_into. Check is on L344. Since recv_into now exists in esp32spi, the regular implementation is used instead of the backwards compatible one. The regular one doesn't check that data was actually read - it relies on the callee raising an OSError. Compare that to the backwards compatible one which was made with esp32 in mind, it raises an OSError, because esp32spi doesn't. My patch makes the behavior consistent in both implementations, we check ourselves that data was read, if not, throw an OSError as the callee should've.
recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, now I understand. Thank you for clarifying.
Would you be interested in adding this to the PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking of this a bit more, I am not convinced this is the way to go. Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank's @vladak for the input. Will wait for @aaftonbladet 's response before moving forward. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
All callers of With that, a bug could arise if the socket module has its own timeout class. We'd try to catch that one instead of the generic OSError. In comes my patch, where it's possible that an OSError is still thrown, which then wouldn't be caught. Maybe we can check that there's no custom timeout class before throwing the error? If there's a custom timeout class, it's likely the underlying socket will handle timeouts itself rendering my patch redundant anyhow. And yes, it's possible library consumers would be exposed to a new type of exception. But who's this an issue for? If you call
The socket implementation is told to use the given receive timeout; search for
Right, sorry for not clarifying. The issue lies in us going past L966 because no exception is thrown, and then eventually into the loop at L971. Then we're stuck in that loop for 60 seconds, which is consistent with user reports. To compensate for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When traversing the code submitted to various issues, there is often the following pattern: try:
io.loop()
except (MMQTTException) as e:
print("MQTTException: \n", e)
time.sleep(300)
continue so the OSError will cause trouble for these. Personally, I don't see a reason for the OSError exception to shoot through as it is implementation detail of the MQTT library. Looking at the loop() implementation, it will cause a problem there too. The while cycle seems to be designed around the receive timeout, yet it is strange how the function deals with timeouts in general. It overrides the socket timeout, which I don't think it should do (it also avoids the socket timeout < receive timeout check). It cycles up to receive timeout and it seems to me that the intention is that the timeout argument is meant as the overall wait time, no matter what socket timeout is set to. If socket timeout in CPython raises OSError, this will propagate via _wait_for_msg() to loop(), breaking this behavior. I think that the timeout argument for loop() should override the receive timeout and the function should avoid mingling with the socket timeout. Someone noticed that too: #68 (comment) If we go with this change, I think the _wait_for_msg() needs to catch the OSError, to make both if branches uniform (return None on socket timeout): Adafruit_CircuitPython_MiniMQTT/adafruit_minimqtt/adafruit_minimqtt.py Lines 1027 to 1040 in 40b9096
|
||||||||||||||||||||||||||||||
to_read = bufsize - recv_len | ||||||||||||||||||||||||||||||
if to_read < 0: | ||||||||||||||||||||||||||||||
raise MMQTTException(f"negative number of bytes to read: {to_read}") | ||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another call to
recv_into()
in thewhile
cycle below (L1134). Does it need same treatment ? (if we can agree on the treatment being correct)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say no, it doesn't need it. The first check is just to see if there's any data pending in the socket. When we reach the while loop, we already know we have things to read.