Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

fix(ooxml): regexp does not replace the whole tag in every case #28

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PandiPanda69
Copy link
Contributor

Hey !

I just figured out that the externalData is not a leaf. Thus, sometimes, the DOM was just fucked up because the replacement was a big #fail.

Tell me if you see anything wrong :-)

@punkeel
Copy link
Contributor

punkeel commented Aug 29, 2017

Thanks for the patch!

Do you believe this regex is "enough", or should we resort to some XML parsing library just to be sure?
& the regex does not match < c:externalData/> because of the extra space 😕

@PandiPanda69
Copy link
Contributor Author

The more I look at the code the more I think it could be a nice option. I'm only afraid about how slow it could be to parse the DOM of every single file. But we should give it a try and see. Probably that disabling the DTD/schema checking would not be that slow.

What do you think?

@punkeel
Copy link
Contributor

punkeel commented Aug 31, 2017

If performance is really an issue, we might read the text looking for "externalData" (the check is already in place).

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // <-
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputFile); // Using the stream *might* improve performances, right?

NodeList nodes = doc.getElementsByTagName("externalData"); // <- Without the NS
for (int i = 0; i < nodes.getLength(); i++) {
// Remove? Empty it?
// getParentNode then removeChild?
}

I never used XML APIs in Java (and did not test the above code) so I can't judge the performance impact, but this does not depend on XPath tricks and should be fast enough, right? 😕

And we don't have to parse every files, do we? I don't have samples to check but I can only guess the "externalData" must be in a specific file, and using the relations we could find it. Am I right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants