Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance XML Loader to tolerate whitespace #576

Merged
merged 1 commit into from
Jun 1, 2021

Conversation

mbeckerle
Copy link
Contributor

ConstructingParser was intolerant of files that didn't
start with "<".

Modified to allow whitespace, processing instructions, and comments
to precede the document element.

Tests are

testLoaderToleratesXMLWithLeadingWhitespace
testLoaderToleratesXMLWithLeadingComments
testLoaderToleratesXMLWithPI

DAFFODIL-2527

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Would be good to get a fix merged upstream so we don't need to maintain this, but as a temporary work around this seems fine.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Would like to see upstream take changes that would simplify our implementation and avoid any duplication of upstream code.

* This does not handle DOCTYPEs (aka DTDs) at all. Hence, is not
* a true replacement (bug fix) on the original ConstructingParser method
* that it overrides.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose upstream takes your document() change to tolerate whitespace before the first < character and other big fixes, could you also suggest they take a change to make it easier to load XML securely? Maybe the upstream PR could let us override a doctype function or something like that so we don't need to duplicate any upstream code in our constructing parser implementation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, the scala xml library now defaults (as of version 2.0.0) to "secure". So they've taken that step.

What I don't know is the exact details of what they consider "secure". Does it apply to the constructing parser as well as the regular load calls, etc. Also, e.g., do they disallow DocTypes entirely, or just external entity usage, or ???

Which is why the earlier PR to specify no doctype to be used, so we know exactly how strict we are being.

I will try to get changes to the constructing parser upstreamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala XML ticket: scala/scala-xml#532

def fatalError(exception: SAXParseException) = {
exceptions += exception
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at end of file (but up to you).

ConstructingParser was intolerant of files that didn't
start with "<".

Modified to allow whitespace, processing instructions, and comments
to precede the document element.

Tests are

testLoaderToleratesXMLWithLeadingWhitespace
testLoaderToleratesXMLWithLeadingComments
testLoaderToleratesXMLWithPI

Further enhanced to handle PI that starts with <?xml-model ...?>

This occurs in test infoset data for the iCalendar DFDL schema.
Xerces was tolerating this before, so our constructing loader
should also.

DAFFODIL-2527
@mbeckerle mbeckerle merged commit cf8437e into apache:master Jun 1, 2021
@mbeckerle mbeckerle deleted the daf-2527-xml branch June 1, 2021 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants