-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance XML Loader to tolerate whitespace #576
Conversation
daffodil-lib/src/test/scala/org/apache/daffodil/xml/test/unit/TestXMLLoader.scala
Show resolved
Hide resolved
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
Show resolved
Hide resolved
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
Show resolved
Hide resolved
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
Outdated
Show resolved
Hide resolved
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
Outdated
Show resolved
Hide resolved
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Would be good to get a fix merged upstream so we don't need to maintain this, but as a temporary work around this seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Would like to see upstream take changes that would simplify our implementation and avoid any duplication of upstream code.
* This does not handle DOCTYPEs (aka DTDs) at all. Hence, is not | ||
* a true replacement (bug fix) on the original ConstructingParser method | ||
* that it overrides. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose upstream takes your document() change to tolerate whitespace before the first <
character and other big fixes, could you also suggest they take a change to make it easier to load XML securely? Maybe the upstream PR could let us override a doctype function or something like that so we don't need to duplicate any upstream code in our constructing parser implementation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the scala xml library now defaults (as of version 2.0.0) to "secure". So they've taken that step.
What I don't know is the exact details of what they consider "secure". Does it apply to the constructing parser as well as the regular load calls, etc. Also, e.g., do they disallow DocTypes entirely, or just external entity usage, or ???
Which is why the earlier PR to specify no doctype to be used, so we know exactly how strict we are being.
I will try to get changes to the constructing parser upstreamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala XML ticket: scala/scala-xml#532
def fatalError(exception: SAXParseException) = { | ||
exceptions += exception | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newline at end of file (but up to you).
ConstructingParser was intolerant of files that didn't start with "<". Modified to allow whitespace, processing instructions, and comments to precede the document element. Tests are testLoaderToleratesXMLWithLeadingWhitespace testLoaderToleratesXMLWithLeadingComments testLoaderToleratesXMLWithPI Further enhanced to handle PI that starts with <?xml-model ...?> This occurs in test infoset data for the iCalendar DFDL schema. Xerces was tolerating this before, so our constructing loader should also. DAFFODIL-2527
ConstructingParser was intolerant of files that didn't
start with "<".
Modified to allow whitespace, processing instructions, and comments
to precede the document element.
Tests are
testLoaderToleratesXMLWithLeadingWhitespace
testLoaderToleratesXMLWithLeadingComments
testLoaderToleratesXMLWithPI
DAFFODIL-2527