-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USB ZLP exponential backoff kills performance of large responses #83
Comments
Experience shows that some devices tend to send zero-length USB packets even when they shouldn't, and this is a normal occurrence with these devices. The log level for these events has been reduced from Error to Debug in order to minimize clutter in the log files.
Hi Timo, glad to hear from you after some years of silence! :-)
Done.
I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?
I think your hypothesis that device sends ZLP when device currently has no data to send and doesn't want to wait for more data is the very good hypothesis. However, it seems to happen in two cases:
Some devices send ZLP in the middle of transaction, then continue to send more data. So I think, interpreting ZLP as the end of transaction indication is not safe. We will loose synchronization with devices that insert ZLP in the middle. However, we can implement it as a quirk for those few devices that really need it. What do you think? P.S. The current implementation sometimes a way too fragile. Once synchronization is lost, it cannot be recovered, and device fails forever (until reboot/reconnect). I'm thinking about attempt to implement some synchronization recovery logic, which will include the following:
Unfortunately, I have no idea how to reliably distinguish between failed HTTP transaction and HTTP transaction that just takes too long time to proceed. For example, reading image from scanner may include minutes of wait (low-end scanners acting at high resolutions). If I could to detect failed transaction, I could implement some reset-and-recover logic... |
Some devices sometimes send 0-length USB packets, and then can do it continuously. So, to avoid excessive CPU consumption, ipp-usb inserts some pauses when zero-length USB packet is received, and use exponential backoff algorithm to manage wait interval. Parameters of this algorithm was tuned to avoid performance limitation when working with the high-end devices that may send A LOT OF data in response at the very high speed. See discussion under #83 for further details.
Thanks. I'll test this on Friday.
Yes, this is unfortunate.
Correct. Just getting one ZLP alone is not enough. Sadly, because of different vendors having different interpretations of this low-level bulk request detail (possibly due to not understanding what ZLPs are for).
Though, at least the Mopria Alliance eSCL specification implies that the server should return I think we can do two things:
After the above timeouts we can assume everything is drained and release the channel for future use. Might be a good idea to keep draining idle channels. |
This seems to work just fine. Thank you! The actual main issue seems to have been the initial delay of 100ms, and changing it to 10ms was the primary fix. The data stream will occationally send ZLP, but will be almost immediately able to give additional data. Perhaps some internal device queue was drained causing the ZLP, but the data is ready so its just matter of some process getting scheduled and more data is available. I observed only few occasions when there was two or perhaps three consecutive ZLPs also - in these cases the updated backoff also helps a lot. I think the performance issue can be considered fixed now. The other issue of resynchronization is still left, but there are also other issues about it. This issue can be closed unless you want to keep it open for the sync issue as well? |
Some enterprise-level HP devices, during the initialization phase (which can last several minutes), may respond with an HTTP 503 status or similar, which is expected. However, the response body may be truncated (typically, the terminating '\n' is lost). In such cases, `ipp-usb` will wait indefinitely for a response to maintain synchronization with the device. At the same time, these devices send a zero-length UDP packet at the end of the truncated output. If the zlp-recv-hack quirk is enabled, when ipp-usb receives a zero-length packet from the USB followed by a receive timeout, it interprets this combination of events as a valid termination of the response body. It works only at the initialization time and doesn't affect futher operations. See discussion under #64 for details. Related issues are #83 and #32.
If this quirk is enabled, device initialization is retried in case only part of the device's functions have been initialized (say, only the printer or only the scanner), instead of continuing to operate with incomplete functionality. It can be useful if the device takes a long time to fully initialize. During this period, some components may respond normally while others are still initializing. For example, the device may quickly report its scanning capabilities shortly after startup, while its printing functionality may take several minutes to become operational. Some enterprise-level HP printers are known to have this problem. See #64, #83 and #32 for details.
It seems that the USB Zero Length Packet (ZLP) is not well understood as its logged as error and uses exponential back-off delay. Though, the fact that these packets are seen often means that the Printer/Scanner device developers have not really understood it well either.
The relevant code is at:
https://github.com/OpenPrinting/ipp-usb/blob/master/usbtransport.go#L797-L828
On the wire level USB ZLP is sent when the USB Bulk Transaction is sent when:
The latter case is handled internally in libusb. So the only case when a libusb user sees ZLP is the first one: the device terminated the transaction without sending data.
Seems that at least HP keeps sending ZLP:s when the data is not yet available. However, the exponential algorithm will kill performance on large responses. This specifically causes scanning to not work at full speed with HP Color LaserJet Entrprise Flow MFP X677 printers (see also alexpevzner/sane-airscan#347).
Even fixed 100ms delay was not enough. So I currently just changed the ZLP to do a 10ms fixed delay to get full scanning speed.
Recommended changes:
This likely also improves things in connection with #32.
Technically, the correct interpretation of seeing a ZLP should be interpreted as "socket closed" and terminate waiting the HTTP response. This interpretation would allow fix the problem in #64 of
GET /ipp/faxout
response not containing the end-of-body newline. However, due to the said bug of manufacturer sending ZLPs this does not work immediately so.May I recommend that if a HTTP Error code response is received, and multiple ZLPs are seen when waiting for the body, then these ZLPs will be considered as end-of-body and will abort the response waiting.
The text was updated successfully, but these errors were encountered: