-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP body content is truncated in some cases despite setting max-size to a high value #394
Comments
To rule out a silly, re-tested curl with HTTP1.1 and TLS1.2 - it's not the culprit |
A couple of ideas...
I read through the http module and didn't see anything obvious, though I'm far from an authority The length calculations seem to be consistent, using the standard idiom (roughly):
EDIT: Thanks to your (very large) output, I think I might see the issue. The length enforcement calculation depends upon content-length, but the response is using chunked encoding:
While you could make changes in zgrab2 code to accommodate this, it would probably be easier to instruct the server to not use chunked encoding. I don't recall off the top of my head, perhaps there is a header the client can send to inhibit that behavior from the server? Maybe not? If this is in fact the issue, I'm a little surprised that nobody noticed this. Seems you could avoid chunked-encoding entirely by specifying HTTP/1.0 in your request as a possible test to determine if it is in fact the issue, since HTTP 1.0 doesn't support chunked encoding |
@mzpqnxow, thanks for your response. I think you are onto something. I tried using --with-body-size, but it didn't change anything. I attempted to send the Accept-Encoding: Identity header to the server (--custom-headers-names=Accept-Encoding --custom-headers-values=identity), but it actually cut off the body even shorter. I think this might be caused by this portion: Lines 477 to 499 in 97ba87c
With HTTP1.1, the connection is assumed to be persistent, and t.Body = NoBody is set for the second pass. |
I'm not sure what you mean by second pass (I'm on mobile, couldn't follow the code very easily) but that phrase makes me think of a somewhat common chunked encoding pattern, which is to send multiple chunks Often this is because the application/reverse-proxy/server is pulling content from multiple sources, especially when one may block or have an unknown size Common example, the first chunk may be a static header from a memory cache where the size is known and no blocking will occur and the second chunk may be content that may be coming from a backend API over the network and streamed rather than buffered When you use curl -vvvv (or the relevant trace params, or tcpdump if it's not TLS) do you see multiple chunks in the response? Wondering if only the first chunk is being processed. Maybe you already considered this and that's what you meant by "second pass"? Perhaps you're already familiar with chunked encoding, but from what I recall of chunked-encoding, 2 chunks of sizes of 4 and 8 would look like this on the wire:
With tcpdump or curl trace output, you should be able to see each chunk length and chunk If there are in-fact multiple chunks, you can check to see of the zgrab2 output is ending where the second chunk in curl/tcpdump is beginning Either way, it sounds like the "fix" (if multiple chunks is actually the issue- possibly a big assumption on my part) would be:
Some caveats...
For 4, if there was any concern about this changing (breaking) things for some users who encounter servers which may require sending a chunked response (e.g. if there are some that refuse to or are unable to buffer all of the response data) it could be an opt-in behavior. It could also be "just-in-time", applied only when needed using a flag similar to how retry-https and fail-https-to-http work. At a high level, those flags abandon the first response if a "problem" is detected and then re-request with the workaround version of the request. For this case, I think the re-request would be sent only when the response is chunked and the first chunk is < the max response body length. The re-request would explicitly request HTTP 1.0, to prevent a chunked response Assuming there is a modification made to reliably receive all chunks, the implementation could contain a short-circuit to ensure additional chunks are only received when necessary:
Apologies for any incoherence- I'm playing the "sorry, on mobile" card one last time |
So I spent a 20 minutes today looking at what would be needed to force the golang HTTP package to send HTTP/1.0 (and potentially other values). That's the quickest blunt-instrument solution for mishandling of chunked encoding responses - since HTTP/1.0 signals to the server not to use chunked encoding in the response. What I found was not cool. It also might be neat to be able to use invalid values (e.g. 1.3, or "" !) I went to the http package and noticed this, where Protocol.Major and Protocol.Minor are set to 1 and 1 respectively: Lines 818 to 825 in 97ba87c
As a test, I made a quick change to use So I looked more closely and noticed this: Lines 576 to 579 in 97ba87c
The I didn't look much beyond this as I barely had the time to clean up a few of my branches, but it seems at first glance that making changes would be a pretty invasive change to the HTTP code which would break other (zgrab2 internal) uses of That's not considering the > 150 uses of Another option would be to make a duplicate of that function with a suffix, and instruct And/or, perhaps it would be best to have the code properly honor Either way, I don't have to time to go further into it right now but would like to return to it at some point. Aside from possibly addressing any issues with chunked encoding, there may be some servers out there that have interesting behavior when HTTP is not 1.1 (though I'm sure it would only be a tiny fraction if that is the case) So basically, this update is a no-op, or some basic info if anyone wants to go ahead and spend some time making the protocol, major and minor customizable via a flag Sorry if this is incoherent - written a bit hastily |
On latest main.
An example site to reproduce:
attached full output as zgrab_test.txt
Adjusting max-size to different values made no difference
The same via cURL
About 340K of content did not make it into the zgrab body, despite having max-size of 20MB
The text was updated successfully, but these errors were encountered: