Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB ZLP exponential backoff kills performance of large responses #83

Open
fabled opened this issue Sep 10, 2024 · 3 comments
Open

USB ZLP exponential backoff kills performance of large responses #83

fabled opened this issue Sep 10, 2024 · 3 comments

Comments

@fabled
Copy link

fabled commented Sep 10, 2024

It seems that the USB Zero Length Packet (ZLP) is not well understood as its logged as error and uses exponential back-off delay. Though, the fact that these packets are seen often means that the Printer/Scanner device developers have not really understood it well either.

The relevant code is at:
https://github.com/OpenPrinting/ipp-usb/blob/master/usbtransport.go#L797-L828

On the wire level USB ZLP is sent when the USB Bulk Transaction is sent when:

  • the device wants to indicate that there is no data to send (and it does not want to wait for it either; though arguably it should wait up to the USB timeout specified in the USB)
  • some data has been sent, and the amount of bytes is sent is divisible by the USB block length. In this case, a ZLP packet is sent to terminate the transaction (otherwise, the last USB packet which is not full will implicitly terminate the transaction)

The latter case is handled internally in libusb. So the only case when a libusb user sees ZLP is the first one: the device terminated the transaction without sending data.

Seems that at least HP keeps sending ZLP:s when the data is not yet available. However, the exponential algorithm will kill performance on large responses. This specifically causes scanning to not work at full speed with HP Color LaserJet Entrprise Flow MFP X677 printers (see also alexpevzner/sane-airscan#347).

Even fixed 100ms delay was not enough. So I currently just changed the ZLP to do a 10ms fixed delay to get full scanning speed.

Recommended changes:

  • downgrade the ZLP from Error to Debug
  • change the exponential backoff to either fixed 10ms (or configurable) delay, or to some non-exponential algorithm like linear or logarithmic increase
  • keep the maximum delay as something smaller than 1second. perhaps in the maginuted of 100-200ms; issuing an usb bulk transfer request is not cpu intensive

This likely also improves things in connection with #32.

Technically, the correct interpretation of seeing a ZLP should be interpreted as "socket closed" and terminate waiting the HTTP response. This interpretation would allow fix the problem in #64 of GET /ipp/faxout response not containing the end-of-body newline. However, due to the said bug of manufacturer sending ZLPs this does not work immediately so.

May I recommend that if a HTTP Error code response is received, and multiple ZLPs are seen when waiting for the body, then these ZLPs will be considered as end-of-body and will abort the response waiting.

alexpevzner added a commit that referenced this issue Sep 10, 2024
Experience shows that some devices tend to send zero-length USB packets
even when they shouldn't, and this is a normal occurrence with these
devices.

The log level for these events has been reduced from Error to Debug in
order to minimize clutter in the log files.
@alexpevzner
Copy link
Member

Hi Timo,

glad to hear from you after some years of silence! :-)

downgrade the ZLP from Error to Debug

Done.

change the exponential backoff to either fixed 10ms

I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?

then these ZLPs will be considered as end-of-body and will abort the response waiting.

I think your hypothesis that device sends ZLP when device currently has no data to send and doesn't want to wait for more data is the very good hypothesis.

However, it seems to happen in two cases:

  • all scan data is received and there is no more data expected -- i.e., the end of transaction
  • scan unit is just too slow to provide more data in time, though transaction is not ended yet

Some devices send ZLP in the middle of transaction, then continue to send more data.

So I think, interpreting ZLP as the end of transaction indication is not safe. We will loose synchronization with devices that insert ZLP in the middle.

However, we can implement it as a quirk for those few devices that really need it.

What do you think?

P.S. The current implementation sometimes a way too fragile. Once synchronization is lost, it cannot be recovered, and device fails forever (until reboot/reconnect).

I'm thinking about attempt to implement some synchronization recovery logic, which will include the following:

  • at the initialization, drain USB connection for some time (500-1000ms): just read from USB and drop received data, if any
  • drain idle USB connections all the time until they are reused
  • something more?

Unfortunately, I have no idea how to reliably distinguish between failed HTTP transaction and HTTP transaction that just takes too long time to proceed. For example, reading image from scanner may include minutes of wait (low-end scanners acting at high resolutions). If I could to detect failed transaction, I could implement some reset-and-recover logic...

alexpevzner added a commit that referenced this issue Sep 11, 2024
Some devices sometimes send 0-length USB packets, and then can do it
continuously. So, to avoid excessive CPU consumption, ipp-usb inserts
some pauses when zero-length USB packet is received, and use exponential
backoff algorithm to manage wait interval.

Parameters of this algorithm was tuned to avoid performance limitation
when working with the high-end devices that may send A LOT OF data in
response at the very high speed.

See discussion under #83 for further details.
@fabled
Copy link
Author

fabled commented Sep 11, 2024

I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?

Thanks. I'll test this on Friday.

However, it seems to happen in two cases:

* all scan data is received and there is no more data expected -- i.e., the end of transaction
* scan unit is just too slow to provide more data in time, though transaction is not ended yet

Yes, this is unfortunate.

So I think, interpreting ZLP as the end of transaction indication is not safe. We will loose synchronization with devices that insert ZLP in the middle.

Correct. Just getting one ZLP alone is not enough. Sadly, because of different vendors having different interpretations of this low-level bulk request detail (possibly due to not understanding what ZLPs are for).

Unfortunately, I have no idea how to reliably distinguish between failed HTTP transaction and HTTP transaction that just takes too long time to proceed. For example, reading image from scanner may include minutes of wait (low-end scanners acting at high resolutions). If I could to detect failed transaction, I could implement some reset-and-recover logic...

Though, at least the Mopria Alliance eSCL specification implies that the server should return 503 if it cannot attend to the request in timely manner. And at least that is what the HP printers do. If the HTTP level request exceeds a timeout, it'll return 503 and the HTTP request needs to be redone. I hope there are no devices that wait long periods there.

I think we can do two things:

  • if HTTP status != 200 to indicate error has been seen, we can assume the body is non-existent or small. In this case just a timeout of few seconds to wait for end-of-body should be enough
  • if HTTP status == 200 has been seen, we could just set a bit longer timeout (e.g. minute or more); in these cases if something fails the device usually reboots or is disconnected

After the above timeouts we can assume everything is drained and release the channel for future use. Might be a good idea to keep draining idle channels.

@fabled
Copy link
Author

fabled commented Sep 14, 2024

I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?

This seems to work just fine. Thank you!

The actual main issue seems to have been the initial delay of 100ms, and changing it to 10ms was the primary fix. The data stream will occationally send ZLP, but will be almost immediately able to give additional data. Perhaps some internal device queue was drained causing the ZLP, but the data is ready so its just matter of some process getting scheduled and more data is available. I observed only few occasions when there was two or perhaps three consecutive ZLPs also - in these cases the updated backoff also helps a lot.

I think the performance issue can be considered fixed now. The other issue of resynchronization is still left, but there are also other issues about it. This issue can be closed unless you want to keep it open for the sync issue as well?

alexpevzner added a commit that referenced this issue Dec 8, 2024
Some enterprise-level HP devices, during the initialization phase
(which can last several minutes), may respond with an HTTP 503
status or similar, which is expected. However, the response body may
be truncated (typically, the terminating '\n' is lost). In such
cases, `ipp-usb` will wait indefinitely for a response to maintain
synchronization with the device.

At the same time, these devices send a zero-length UDP packet at the
end of the truncated output. If the zlp-recv-hack quirk is enabled,
when ipp-usb receives a zero-length packet from the USB followed by
a receive timeout, it interprets this combination of events as a
valid termination of the response body. It works only at the
initialization time and doesn't affect futher operations.

See discussion under #64 for details.

Related issues are #83 and #32.
alexpevzner added a commit that referenced this issue Dec 10, 2024
If this quirk is enabled, device initialization is retried in case only
part of the device's functions have been initialized (say, only the
printer or only the scanner), instead of continuing to operate with
incomplete functionality.

It can be useful if the device takes a long time to fully initialize.
During this period, some components may respond normally while others
are still initializing. For example, the device may quickly report its
scanning capabilities shortly after startup, while its printing
functionality may take several minutes to become operational.

Some enterprise-level HP printers are known to have this problem.

See #64, #83 and #32 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants