FeatureRequest: Migrate to a faster xml parser #10

groceryheist · 2018-09-11T18:34:42Z

Others have reported improved performance when using expat to parse Wikimedia dumps. We are currently using ElementTree which provides a good balance between usability and speed.

There is probably potential to speed up this library by switching to a faster xml parser. Candidates include:

lxml
cElementTree
expat

Migrating to lxml or cElementTree might be relatively easy because they have similar APIs to ElementTree.

halfak · 2018-09-11T18:35:43Z

I tested cElementTree a while back and found that I generally got similar or worse performance in python3. I'm not sure why. That was over 2 years ago, so it might be worth testing again.

groceryheist self-assigned this Sep 12, 2018

groceryheist added the enhancement label Sep 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FeatureRequest: Migrate to a faster xml parser #10

FeatureRequest: Migrate to a faster xml parser #10

groceryheist commented Sep 11, 2018

halfak commented Sep 11, 2018 •

edited

Loading

FeatureRequest: Migrate to a faster xml parser #10

FeatureRequest: Migrate to a faster xml parser #10

Comments

groceryheist commented Sep 11, 2018

halfak commented Sep 11, 2018 • edited Loading

halfak commented Sep 11, 2018 •

edited

Loading