-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs2-data-xml streaming parser #25
Conversation
ThisBuild / tlBaseVersion := "0.23" | ||
ThisBuild / tlMimaPreviousVersions ++= (0 to 11).map(y => s"0.23.$y").toSet | ||
ThisBuild / tlBaseVersion := "0.24" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't decide whether or not to bump the base version. We can make this change binary-compatibly, it's more about semantics.
@deprecated("0.23.12", "This factory is no longer used") | ||
protected def saxFactory: SAXParserFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the (semantically) breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing SAX will also change the types of errors that are raised.
case e: Elem => Right(e) | ||
case _ => Left(MalformedMessageBodyFailure("Invalid XML")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this charade, maybe this should become an EntityEncoder[F, Document]
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks compelling as an 0.24. It doesn't emit any partial documents, but it does avoid collecting the entire body into a Chunk[Byte]. It also opens the door to a decoder of streaming events.
I do wonder how the performance compares to SAX. My guess is not favorably, but perhaps similar?
@deprecated("0.23.12", "This factory is no longer used") | ||
protected def saxFactory: SAXParserFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing SAX will also change the types of errors that are raised.
implicit def xmlEvents[F[_]](implicit F: Concurrent[F]): EntityDecoder[F, Stream[F, XmlEvent]] = { | ||
import EntityDecoder._ | ||
decodeBy(MediaType.text.xml, MediaType.text.html, MediaType.application.xml) { msg => | ||
val source = new InputSource() | ||
msg.charset.foreach(cs => source.setEncoding(cs.nioCharset.name)) | ||
DecodeResult.successT(msg.bodyText.through(fs2.data.xml.events())) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There, now we can decode as a stream of events too 😁
There's a lot to like here:
There are also reasons I hesitate to retire the 0.23 impl:
Both of these problems could be me being ignorant. One way forward is to have a parallel SAX and an fs2-data implementation here and let people pick their pros and cons. Another way forward is to have an http4s-fs2-data repo, which is an alternate way to do XML. An advantage there is that we could implement the other formats that fs2-data supports. I do have a favorable impression of the library! But is there really enough XML demand in http4s to support two XML libraries? (Three, if we count scala-xml-1?) /cc @sjenna, who is maintaining http4s-scala-xml-1 and might be interested in the proceedings... |
scala-xml itself cross-compiles for Scala.js. The problem is that nobody has ported the Java-lib parser to Scala.js. So it's parser-less.
There's also those test failures I mentioned in #25 (comment) and gnieh/fs2-data#332. I don't understand RFCs well enough to appreciate if they are important or fixable.
I don't care enough to maintain it 😂 this was low-hanging fruit, and somebody had wanted to use XML with http4s-dom in the browser. BTW, it's not obvious to me why we need a separate repo for scala-xml-1, how different is it? Since it has a different artifact name but same versioning we could theoretically publish side-by-side from a single set of sources. |
I don't know that scala-xml-1 is going to live to see http4s-1.0. But they're getting mostly the same Steward bumps, and backporting #28 (I'm calling it a bugfix) was a pain. That repo is mainly Jenna preserving a cohesive stack after we jumped out ahead of some other libraries (i.e., Twirl). Note that this PR pulls in a scala-xml-2 artifact, and would need to live in a wholly separate module if scala-xml-1 moved back here. I haven't had time to dig into that encoding failure you hit yet. |
Not just this PR, this repo already pulls in scala-xml-2 anyway :) http4s-scala-xml-1 and http4s-scala-xml are already separate artifacts, but both in the 0.23 series. If their sources are similar enough we can keep one copy and cross-compile it as two separate projects with the separate scala-xml dependencies, all within a single repository. Seems like supporting fs2-data-xml parsing for scala-xml-1 would basically require this file: |
@@ -46,7 +46,7 @@ trait ElemInstances { | |||
implicit def xmlEvents[F[_]](implicit F: Concurrent[F]): EntityDecoder[F, Stream[F, XmlEvent]] = | |||
EntityDecoder.decodeBy(MediaType.text.xml, MediaType.text.html, MediaType.application.xml) { | |||
msg => | |||
DecodeResult.successT(msg.bodyText.through(fs2.data.xml.events())) | |||
DecodeResult.successT(msg.bodyText.through(fs2.data.xml.events(includeComments = true))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to make the new round-trip test pass. But is it actually what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I conflated normalization in the generator with normalization in the test in a way I probably shouldn't have. I guess the question is whether comments are part of normalized output. I think it belongs, perhaps with an option to turn it off.
|
So thinking about where to put things...
fs2-data has the most upside. If we make an http4s-fs2-data, it's probably the one people will want to use in the long run. If we continue here and make it http4s-scala-xml-0.24, there will be regression in certain facets today, but reason to hope it dominates Xerces in the near future. Finishing Aalto would not be hard and probably the easiest way to dominate Xerces, and real hard to beat as a JVM only solution. fs2-data could eventually eat its lunch too, but it will be a higher performance bar. |
My best argument for a separate http4s-fs2-data is that it would be a natural home for CBOR support. fs2-data is much more than XML, and some of it is relevant. It might also be less discoverable than, like, http4s-cbor. (I don't actively use CBOR... this is mostly hypothetical to see if it makes anyone jump up and raise their hands and yell "ooh!") |
I do love your fantastic tables! ❤️ Agree, an http4s-fs2-data could make good sense. Specifically to XML, it gives a chance to "incubate" while performance and compliance improve, without being a regression as it would here. More generally, all the other fun integrations like cbor, csv, what-have-you :) Short term, it will need a bit more activation energy and bike-shedding to get off the ground 😂 |
I could set up a repo and cherry-pick your commits without much fuss. The slightly more annoying part would be publishing the generators somewhere, as I really don't want multiple copies. I could always move those to a rural Typelevel project. |
Change to base branch will help me rebuild this over in http4s-fs2-data, after I, like, sleep and stuff. |
Migrated to http4s/http4s-fs2-data#3. |
Might close #4 and #3.
Currently failing: