You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.
Hi folks,
First thank you for all the hard work that has gone into Uproot, it's pretty amazing!
I'm currently experience some performance issues with writing out TTree's and wondering if I'm just doing something wrong. My event detector combines together multiple instrument hits into variable size events. In memory I'm just storing these as a list of dicts such as
Currently I'm writing out around 1.4million entries in the list. I use a buffer I can set at run time for how many of these to flush out to the ROOT file at a time. I then do one command to get a file handle :
I currently set the buffer size to 100,000 events so the write code above is called for each block of 100,000 entries. Ideally I'd like to set this buffer to several million for some multiprocessor performance gains in earlier parts of the code. However this currently runs quite slow and the larger I make the buffer past ~10,000 events the slower it gets. To write out 1.4million events in 20k chunks takes >4min even with PyPy3.
with no discernible gain. If I load in the normal ROOT Python interface it takes ~40seconds, when converting to pandas dataframe and using either to_hdf or to_csv it also takes ~40seconds.
Maybe I'm just doing something silly or is there still quite a bit of work needed on the TTree writing component?
Thank you!
-jon
The text was updated successfully, but these errors were encountered:
Without having a chance to look specifically into this, I should point out that performance was a lower priority for writing than it was for reading (because writing is a much more constrained problem). As written data grow beyond initially prescribed boundaries, objects need to be rewritten and all pointers to them need to be updated to keep the file consistent.
That said, I'll be taking a look at the writing code soonish, integrating it into Uproot4. I'm not expecting to find performance bugs (mistakes that should be fixed, "premature optimization" aside, like Shlemiel the painter’s algorithm). However, if there's something fundamental to fix or even just small tweaks, they'll be implemented in the new code, taking the original code as a correctness baseline.
So to answer your question, you shouldn't be thinking of the writing component as a performance-first thing. It exists for compatibility, though I'll be giving it an end-to-end review soon, along with everything else.
Hi folks,
First thank you for all the hard work that has gone into Uproot, it's pretty amazing!
I'm currently experience some performance issues with writing out TTree's and wondering if I'm just doing something wrong. My event detector combines together multiple instrument hits into variable size events. In memory I'm just storing these as a list of dicts such as
Currently I'm writing out around 1.4million entries in the list. I use a buffer I can set at run time for how many of these to flush out to the ROOT file at a time. I then do one command to get a file handle :
and write the buffer via :
I currently set the buffer size to 100,000 events so the write code above is called for each block of 100,000 entries. Ideally I'd like to set this buffer to several million for some multiprocessor performance gains in earlier parts of the code. However this currently runs quite slow and the larger I make the buffer past ~10,000 events the slower it gets. To write out 1.4million events in 20k chunks takes >4min even with PyPy3.
Seems to do the same with normal Python 3.6 using Anaconda. I've also tried using just the basket method.
with no discernible gain. If I load in the normal ROOT Python interface it takes ~40seconds, when converting to pandas dataframe and using either to_hdf or to_csv it also takes ~40seconds.
Maybe I'm just doing something silly or is there still quite a bit of work needed on the TTree writing component?
Thank you!
-jon
The text was updated successfully, but these errors were encountered: