Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request body serialization #81

Open
adamretter opened this issue Aug 10, 2018 · 12 comments
Open

Request body serialization #81

adamretter opened this issue Aug 10, 2018 · 12 comments

Comments

@adamretter
Copy link
Member

I think we need better controls over Request body serialization. Some concerns I have:

xs:base64Binary, xs:hexBinary
Value will be sent as binary data, the media type will be application/octet-stream.

What happens when you actually want to send base64 encoded or hex encoded data?

xs:string
The string will be serialized as UTF-8 octets, the media type will be text/plain.

What happens when you need an encoding which is not UTF-8? For example the HTTP 1.1 default of ISO-8559-1

node()
The node will be serialized as XML, the media type will be application/xml.

What happens when you want to send HTML5?

map(*)
The map will be serialized as JSON string, the media type will be application/json.

What happens when you try and serialize a map which has something in it which cannot be serialized to JSON like a function() or sequence of more than one item?

Of course we already have a lovely spec that clearly defines such things - XSLT and XQuery Serialization 3.1.

I think the above serialization rules are actually fine as the defaults, but that we need to be able to offer greater control over serialization to the user. I would propose that we add an additional arity set of function signatures which take a $serialization-parameters argument which follows the example of fn:serialize and uses the rules of the XSLT and XQuery Serialization 3.1.

I am not quite sure what to do with regards sending base64 binary data, perhaps it is as simple as $binary cast as xs:string? Perhaps we might want to introduce a binary serialization method and any additional parameters that we need for that, so that we can control base64 and hex binary serialization.

@ChristianGruen
Copy link
Member

ChristianGruen commented Aug 10, 2018

What happens when you actually want to send base64 encoded or hex encoded data?

In the current spec, it would look like this (see the bottom of this comment for a version that includes the embedded serialization):

let $data := file:read-binary(...) (: xs:base64Binary :)
return (
  (: Send Binary :)
  http:post('url', $data)
  (: Send Base64 :)
  http:post('url', fn:serialize($data), map { 'headers': { map: 'Content-Type': 'application/octet-stream' } }
)

What happens when you need an encoding which is not UTF-8? For example the HTTP 1.1 default of ISO-8559-1

When generating a Content-Type header string for textual data (strings, XML, JSON), we should definitely append UTF-8 as encoding (charset=UTF-8; it’s not mentioned in the spec so far).

What happens when you try and serialize a map which has something in it which cannot be serialized to JSON like a function() or sequence of more than one item?

This would lead to an http:serialize error (there is a single sentence on that in the spec; we could add an example).

I would have preferred to keep serialization out of the spec (because it’s something that already exists outside the spec), but the major restriction may be that fn:serialize cannot be used to generate results in other encodings (because the encoding parameter is ignored), and we have no alternative for serializing data to a binary representation.

I am not quite sure what to do with regards sending base64 binary data, perhaps it is as simple as $binary cast as xs:string? Perhaps we might want to introduce a binary serialization method and any additional parameters that we need for that, so that we can control base64 and hex binary serialization.

I think it’s simple like that! The cast should suffice in most scenarios; and in the remaining cases, users can still define their own Content-Type and use custom serialization. The binary flag was always confusing to me, because it’s found in no W3 spec, and because it’s one of the most common cases to send binary data via HTTP.

Here are some suggestions on how function calls with optional serialization could look like:

(: Binary data; generated Content-Type string: "application/octet-stream" :)
http:post('url', $base64)

(: Base64 string; generated Content-Type string: "text/plain;charset=UTF-8" :)
http:post('url', string($base64))

(: String with custom encoding; generated Content-Type string: "text/plain;charset=Shift-JIS" :)
http:post('url', '日本', map { 'serialize': map { 'encoding': 'Shift-JIS' } })

(: html5; generated Content-Type string: "text/html;charset=UTF-8" :)
http:post(
  'url',
  <html/>,
  map { 'serialize': map { 'method': 'html', 'version': '5.0' } }
)

@ChristianGruen
Copy link
Member

@adamretter: Would you like me to create a little PR for this solution?

@adamretter
Copy link
Member Author

adamretter commented Aug 25, 2018

Hmm...

So you are proposing to add the XSLT and XQuery Serialization 3.1 to the $options as a serialization option as opposed to adding another arity function signature?

If so that seems reasonable to me :-)
I might suggest that the type of the serialization value in the map should be item() and that we require either:

  1. element(output:serialization-parameters)
  2. map(*)

This would match what users are used to with the $params arg in functions like fn:serialize.


One moment though... I wonder if perhaps we are going to far in offering convenience? Are there any use-cases where we could not just have the user use fn:serialize() and the http client together. I could see some might argue that executing fn:serialize and then http:send could be inefficient, but I don't see why an implementation could not optimize those calls. That said, optimization becomes much more difficult/impossible when users perform fn:serialize much earlier in their code, do some other stuff, and then try and use the result (from fn:serialize) later with http:send.

So I think I am arguing with myself here, but it is good to think about this. I think overall, it is okay to add the same/similar functionality of fn:serialize into the http client functions, as it allows for much easier implementation of efficient sending of the data over the network, without having to write complex XQuery optimizations. Also for the user, it is convenient.

@fgeorges
Copy link
Member

fgeorges commented Aug 25, 2018

What about adding all XSER 3.1 parameters as possible properties in the $options map? Maybe even a serializer function item if that makes sense to be able to configure the option map that is passed to several calls, instead of serializing the content every time:

map {
   'method':     'json',
   'indent':      true,
   'media-type': 'application/rdf+json',
   'serializer':  function($content) {
      ...
   }
}

The spec is already there, and addresses this very topic (and it is part of the common ground between all implementations.) And it supports JSON and HTML5.

@ChristianGruen
Copy link
Member

It was mostly the unresolved encoding issue (see e.g. the Shift-JIS example above) why I thought about readding the serialization parameters to the spec – in the beginning, I hoped that we could get rid of it. We would need to clarify which Content-Type header value will be generated for the available serialization methods (the rules might be equal to the ones in version 1.0 of the spec).

My initial hope was to keep the new specification as compact as possible. I’m not sure if this might be confusing for some users?

An alternative could be to let users serialize their data and introduce an encoding option (default UTF-8). This value could then be used to send strings in the requested encoding and append the charset value to the media type in the Content-Type header. What do you both think?

@fgeorges
Copy link
Member

encoding is indeed one of the XSER parameters :-)

By simply listing the parameters and their possible values in an appendix, and liking to the XSER spec, that is compact, clear, and comprehensive at the same time.

Because of the specificities of defining a HTTP Client, there might be a few details to deal with besides the XSER, like the impact on Content-Type (on this one, I think that if the user provides it explicitly then we MUST NOT change it, and if not, then media-type, method and encoding are all input that can influence the value to use for that header.)

@ChristianGruen
Copy link
Member

Talking about the impact on Content-Type, we have defined the following rules in our basic BaseX REST API: http://docs.basex.org/wiki/REST#Content_Type.

@adamretter
Copy link
Member Author

adamretter commented Aug 25, 2018

I think we could just have a serialization option in $options that takes the same as fn:serialize.

We should not reproduce XSER 3.1 in our spec or list all the options. Instead we should just link to it. Most of our target audience are already familiar with it

@fgeorges
Copy link
Member

@adamretter I might be mislead as I am not familiar with the current format of $options, but as a user it would bother me to have to say:

map {
   'serialization: {
      'indent': true
   }
}

instead of simply:

map {
   'indent': true
}

@fgeorges
Copy link
Member

@ChristianGruen Yes, something like that. But a bit more sophisticated, e.g. if using method=text to generate plain/text, then add charset if encoding is given as well.

@adamretter
Copy link
Member Author

@fgeorges please examine $options then this will make sense. serialization then is no different to, and fits with things like headers.

Also using a property like $serialization allows serialization options to be reused, it also fits with the design of other XP 3.1 fn: functions

@ChristianGruen
Copy link
Member

ChristianGruen commented Aug 25, 2018

@ChristianGruen Yes, something like that. But a bit more sophisticated, e.g. if using method=text to generate plain/text, then add charset if encoding is given as well.

As Adam pointed out in the past, ISO-8859-1 is the default HTTP encoding, so we should probably always add the charset (by default UTF-8).

map {
'indent': true
}

This syntax may be more compact, but I am not sure if we should simply merge keys from different specs (even now when it is unlikely that we’ll have a future serialization spec with conflicting keys). Instead, I would opt for a dedicated option, in analogy with e.g. file:write) in the EXPath File Module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants