Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeatureRequest: Migrate to a faster xml parser #10

Open
groceryheist opened this issue Sep 11, 2018 · 1 comment
Open

FeatureRequest: Migrate to a faster xml parser #10

groceryheist opened this issue Sep 11, 2018 · 1 comment
Assignees

Comments

@groceryheist
Copy link

Others have reported improved performance when using expat to parse Wikimedia dumps. We are currently using ElementTree which provides a good balance between usability and speed.

There is probably potential to speed up this library by switching to a faster xml parser. Candidates include:

  • lxml
  • cElementTree
  • expat

Migrating to lxml or cElementTree might be relatively easy because they have similar APIs to ElementTree.

@halfak
Copy link
Member

halfak commented Sep 11, 2018

I tested cElementTree a while back and found that I generally got similar or worse performance in python3. I'm not sure why. That was over 2 years ago, so it might be worth testing again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants