Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs2.data.xml.XmlException: character 'ʿ' cannot start a NCName #337

Closed
armanbilge opened this issue Jun 16, 2022 · 6 comments · Fixed by #349
Closed

fs2.data.xml.XmlException: character 'ʿ' cannot start a NCName #337

armanbilge opened this issue Jun 16, 2022 · 6 comments · Fixed by #349
Assignees
Labels
bug Something isn't working xml
Milestone

Comments

@armanbilge
Copy link
Contributor

Via http4s/http4s-scala-xml#25 (comment).

//> using scala "3.1.2"
//> using lib "org.gnieh::fs2-data-xml-scala::1.4.1"

import cats.effect.*
import fs2.*
import scala.xml.*

val xml = """<Ẵ줐샃뗧饜孫 悊頃ふ퉞="ꨍ邭䋒↏᲎ừ" 듸괎:ʿक턻뽜="촏"/>"""

object App extends IOApp.Simple {

  def run = for
    _ <- IO(XML.loadString(xml)) *> IO.println("scala-xml works")
    _ <- Stream.emit(xml).covary[IO].through(fs2.data.xml.events()).compile.drain *> IO.println("fs2-data works")
  yield ()

}
scala-xml works
fs2.data.xml.XmlException: character 'ʿ' cannot start a NCName
        at fs2.data.xml.internals.EventParser$.fail$1$$anonfun$1(EventParser.scala:40)
        at fs2.Pull$$anon$2.cont(Pull.scala:183)
        at fs2.Pull$BindBind.cont(Pull.scala:701)
        at fs2.Pull$ContP.apply(Pull.scala:649)
        at fs2.Pull$ContP.apply$(Pull.scala:648)
        at fs2.Pull$Bind.apply(Pull.scala:657)
        at fs2.Pull$Bind.apply(Pull.scala:657)
        at fs2.Pull$.go$1$$anonfun$1(Pull.scala:1207)
        at fs2.Pull$.interruptGuard$1$$anonfun$1(Pull.scala:933)
        at get @ fs2.internal.Scope.openScope(Scope.scala:281)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Pull$.goCloseScope$1$$anonfun$1$$anonfun$3(Pull.scala:1187)
        at update @ fs2.internal.Scope.releaseChildScope(Scope.scala:227)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at modify @ fs2.internal.Scope.close(Scope.scala:262)
        at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
        at flatMap @ fs2.Pull$.goCloseScope$1$$anonfun$1(Pull.scala:1188)
        at handleErrorWith @ fs2.Compiler$Target.handleErrorWith(Compiler.scala:160)
        at flatMap @ fs2.Pull$.goCloseScope$1(Pull.scala:1195)
        at get @ fs2.internal.Scope.openScope(Scope.scala:281)
@armanbilge
Copy link
Contributor Author

Adding Scalacheck-based tests as proposed in scala/scala-xml#606 would help catch these in fs2-data itself.

@satabin satabin self-assigned this Jun 16, 2022
@satabin satabin added bug Something isn't working xml labels Jun 16, 2022
@satabin
Copy link
Member

satabin commented Jun 16, 2022

I fear this is a limitation of the current character enumeration method. I need to dig deeper.

@satabin satabin added this to the 1.5.0 milestone Jul 7, 2022
@satabin
Copy link
Member

satabin commented Jul 18, 2022

After investigating more I understood what the problem is. The fs2-data XML parser uses XML namespace, which restricts the range of valid element identifier.

The character classes defined here can be derived from the Unicode 2.0 character database as follows:

Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.

Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.

I might change this, to make it optional through an option (NCName parsing or not).

Would that be acceptable to you?

@satabin
Copy link
Member

satabin commented Jul 18, 2022

It looks I was referring to an obsolete version of names, I need to change it, actually…

@armanbilge
Copy link
Contributor Author

Glad you figured it out. I have no clue about this stuff, just reporting the discrepancy I discovered. Appreciate your work!!

@armanbilge
Copy link
Contributor Author

Btw Ross ended up publishing scalacheck instances for scala-xml:
https://github.com/typelevel/scalacheck-xml

satabin added a commit that referenced this issue Jul 18, 2022
Previous version was based on an obsolete definition of what valid
characters can appear in names.
The new ones maps with the character ranges described in the latest
version of the specification.

Ranges are represented as a specialized minimal implementation of a
Discrete Interval Encoding Tree.

Fixes #337
satabin added a commit that referenced this issue Jul 18, 2022
Previous version was based on an obsolete definition of what valid
characters can appear in names.
The new ones maps with the character ranges described in the latest
version of the specification.

Ranges are represented as a specialized minimal implementation of a
Discrete Interval Encoding Tree.

Fixes #337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working xml
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants