-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix quadratic complexity performance bug #1657
Fix quadratic complexity performance bug #1657
Conversation
Wow, awesome work done here. I can imagine you had to spend quite some time investigating this issue. These are my thoughts:
Thanks for your efforts! 💪 |
That's a really interesting find, Kev. Let's not pollute the repository with large test files. One of the Topics for v1.00 is "Deep Test Coverage". One day somebody will enhance the test suite to run more extensive tests on lots of big files. The folks at pixels.us have offered to host it. When that's done, your big fat file might find a home and associated test script. Another Topic for v1.00 is "Consider removing XMPsdk from code base". The tree 15 years later, we have Conan available to build dependencies and we may be about to remove XMPsdk. #1466 (comment) |
@kevinbackhouse thanks so much for the fix with a detailed explanation! I ran into the same issue using XMP toolkit, and this pull request was super helpful. Was there any appetite for landing this upstream in https://github.com/adobe/XMP-Toolkit-SDK? I can file a bug there if this hasn't been proposed already. |
@nohle: I'm glad it helped you! I haven't had any success with upstream (example), but you're welcome to try. Exiv2's codebase contains an out-of-date copy of XMP, which we try to do minimal maintenance on to keep it working. We'd prefer to pull XMP in as a dependency but haven't had any success with that so far. |
This fixes GHSA-w8mv-g8qq-36mj
The problem is that
xmlParser->ParseBuffer()
keeps reparsing the entire string from the beginning. On a reasonably large (malicious) input file, containing lots of invalid (not UTF-8) characters, this causes very slow performance due to quadratic complexity. As far as I can see, this is primarily a limitation of libexpat1, which is outside of our control. But we can make it much better by making sure that we only callxmlParser->ParseBuffer()
once during this function, rather than repeatedly during the loop.I have implemented it by copying everything into a
std::string
and then callingxmlParser->ParseBuffer()
at the end.The poc for the bug is a large invalid file. I am not sure if it's worth wasting storage space on that in our repo, so I didn't add it as a test. Let me know if you think I should add it.