You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.
Lazy arrays are good for interactive exploration, but the most efficient way to process multiple files with Uproot only is uproot.iterate (because it ensures that only a manageable amount of data is in memory at once).
I say "using Uproot only" because if you have a very large number of files, you'll want to distribute the job and run it in parallel. Uproot doesn't do that (as it's strictly an I/O library). Coffea Processors are a convenient way to do it on HEP.
Lazy arrays are good for interactive exploration, but the most efficient way to process multiple files with Uproot only is uproot.iterate (because it ensures that only a manageable amount of data is in memory at once).
I say "using Uproot only" because if you have a very large number of files, you'll want to distribute the job and run it in parallel. Uproot doesn't do that (as it's strictly an I/O library). Coffea Processors are a convenient way to do it on HEP.
Thank you very much!
I tested this script and checked following results:
47 number of files, 470,000 number of events
uproot3 with lazy: 153s
uproot3 with iterate: 54s
uproot4 with iterate: 22s
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
What is the most efficient way to deal with multiple root files (~100G) in uproot3 and uproot4?
I cannot find tutorial about this.
I tried the lazy array but it takes a lot of time.
Also, I tried the iterator but I'm not sure this loop-based method is efficient (https://github.com/JW-corp/J.W_Analysis/blob/main/Uproot/test/big_data.py)
Thanks.
The text was updated successfully, but these errors were encountered: