Processing multiple root files #539

ico1036 · 2021-07-09T06:40:30Z

What is the most efficient way to deal with multiple root files (~100G) in uproot3 and uproot4?
I cannot find tutorial about this.

I tried the lazy array but it takes a lot of time.

# PATH
dir_path = "/x4/cms/dylee/Delphes/data/root/signal/*/*.root"
file_list = glob.glob(dir_path)

# IO
cache = uproot.ArrayCache("2 GB")
events = uproot.lazyarrays(file_list, "Delphes", ['Electron*',"Muon*","Photon*","MissingET*"],cache=cache)

# Define Particle arrays
Electron = ak.zip(
    {
        "PT": events["Electron.PT"],
        "Eta": events["Electron.Eta"],
        "Phi": events["Electron.Phi"],
        "T": events["Electron.T"],
        "Charge": events["Electron.Charge"],
    }
)

Also, I tried the iterator but I'm not sure this loop-based method is efficient (https://github.com/JW-corp/J.W_Analysis/blob/main/Uproot/test/big_data.py)

Thanks.

jpivarski · 2021-07-09T11:13:24Z

Lazy arrays are good for interactive exploration, but the most efficient way to process multiple files with Uproot only is uproot.iterate (because it ensures that only a manageable amount of data is in memory at once).

I say "using Uproot only" because if you have a very large number of files, you'll want to distribute the job and run it in parallel. Uproot doesn't do that (as it's strictly an I/O library). Coffea Processors are a convenient way to do it on HEP.

ico1036 · 2021-07-12T08:22:16Z

Lazy arrays are good for interactive exploration, but the most efficient way to process multiple files with Uproot only is uproot.iterate (because it ensures that only a manageable amount of data is in memory at once).

I say "using Uproot only" because if you have a very large number of files, you'll want to distribute the job and run it in parallel. Uproot doesn't do that (as it's strictly an I/O library). Coffea Processors are a convenient way to do it on HEP.

Thank you very much!
I tested this script and checked following results:

47 number of files, 470,000 number of events
uproot3 with lazy: 153s
uproot3 with iterate: 54s
uproot4 with iterate: 22s

ico1036 changed the title ~~Multiple files~~ Processing multiple root files Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing multiple root files #539

Processing multiple root files #539

ico1036 commented Jul 9, 2021 •

edited

Loading

jpivarski commented Jul 9, 2021

ico1036 commented Jul 12, 2021

Processing multiple root files #539

Processing multiple root files #539

Comments

ico1036 commented Jul 9, 2021 • edited Loading

jpivarski commented Jul 9, 2021

ico1036 commented Jul 12, 2021

ico1036 commented Jul 9, 2021 •

edited

Loading