Custom Response Parsing #127

ChristianGruen · 2018-10-22T16:02:06Z

Here is a summary on the discussion on custom response parsing (#108, #125, and others):

In version 1 of the HTTP Client Module, the override-media-type was available (inspired by the override-content-type option in XProc). It could be used to overwrite the Content-Type header of a response.

In practice, the approach turned out be fairly flexible, but it had some shortcomings: It did not allow for a fine-grained processing of multipart bodies, and it was not intuitive enough for all users.

The following alternatives have been proposed in the scope of version 2 of the spec:

`parse-response` (boolean)

Original draft: https://github.com/expath/expath-cg/blob/1da836628bbdf831fcfc1e4ad9dc487d05e7c663/specs/http-client-2/index.html
Description: Parsing of the response body can be disabled via the parse-response option. All bodies of single and multipart responses will be returned as binary items of type xs:base64Binary, and the values can be processed (stored, parsed, forwarded) in a second step.

Adam pointed out that the name may be misleading, so it’s named parse-response-entity-body in the current draft. My suggestion in #108 was to call it parse-bodies: Only responses are “parsed” (requests are serialized), and the plural form indicates that we may have multiple bodies in a response.

`parse-response` (enum)

Adam made a suggestion for extending the proposal in #108:

raw. We don't have an equivalent option at the moment, but the idea is that the raw response from the server is returned. i.e. no parsing occurs, no status, no headers. This has applications for debugging and also for logging responses.
status. This would be equivalent to status-only: true().
headers. This would be the equivalent to parse-response-entity-body : false()
multipart-raw. This would extract the headers of the response, and locate the multipart bodies, however this would present each multipart in a raw manner, i.e. no multipart headers would be parsed.
full. This would be the default, and basically the same as the current parse-response-entity-body : true()

`parse-response` (map)

In #125, the proposal was extended to a nested map (further discussion see #125 (comment)).

I decided to summarize the proposals as I believe that a plain and simple solution might lead to less confusion and may even be more flexible, because a user can always do post-processing in XQuery.

In my opinion, the major requirements for (non-implicit) response parsing is to be able to retrieve bodies (single part, multiple bodies) in their original representation. In #125, I proposed the following solution:

`parse` / `parse-bodies` (string)

Option	Description
`auto`	implicit parsing (default)
`string`	return all bodies as strings
`binary`	return all bodies as binaries
`skip`	ignore response body

I believe this approach would be sufficient to cover most challenges people will be confronted with (but, honestly, not all that we could envision):

In most cases, people will use the default (auto).
If the requested result cannot be converted to the implicit target format, or if another format is required than resulting from the implicit conversion, the string option can be used for textual results. All bodies will be converted to strings, based on the encoding that is returned by the server (optionally) via the original Content-Type header and the charset option.
The binary option is helpful…
- if the conversion is no text,
- if the string conversion fails,
- if some bodies of a multipart response are textual and some are binary, or
- if the results needs to be processed only as simple stream.
The skip is option is used if only the headers of a result are required.

Some more thoughts on this simplified approach are listed in #125.

Examples for using the approach:

(: return single JSON response as XML :)
http:get('http://json.db/doc123', map { 'parse': 'string' })?body
=> fn:json-to-xml()

(: store returned multipart bodies :)
for $part at $pos in http:get('http://multipart.db/data123', map { 'parse': 'binary' })?body
return file:write-binary($pos || '.bin', $part?body)

(: ignore reponse bodies :)
http:get('http://json.db/doc123', map { 'parse': 'skip' })

@adamretter: Maybe my thoughts are too plain and simple? Do you get some more use cases in mind that we should consider? Looking forward to feedback!

The text was updated successfully, but these errors were encountered:

ChristianGruen added http-client-2 discussion labels Oct 22, 2018

ChristianGruen mentioned this issue Nov 2, 2018

Make control over response parsing more fine-grained #125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Response Parsing #127

Custom Response Parsing #127

ChristianGruen commented Oct 22, 2018

Custom Response Parsing #127

Custom Response Parsing #127

Comments

ChristianGruen commented Oct 22, 2018

parse-response (boolean)

parse-response (enum)

parse-response (map)

parse / parse-bodies (string)

`parse-response` (boolean)

`parse-response` (enum)

`parse-response` (map)

`parse` / `parse-bodies` (string)