Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Convert a table into a dict of Jagged arrays #378

Closed
nfoppiani opened this issue Oct 15, 2019 · 5 comments
Closed

Convert a table into a dict of Jagged arrays #378

nfoppiani opened this issue Oct 15, 2019 · 5 comments

Comments

@nfoppiani
Copy link

nfoppiani commented Oct 15, 2019

This issue is mainly a question.
I would like to know how I can convert a table
obtained with this code uproot.open(root_filename)[tree].lazyarrays()
into a dict of Jagged arrays
obtained normally with this code uproot.open(root_filename)[tree].arrays().
This is because I would like to consider a table, compute a function that produces a mask (a vector of True or False with length equal to the number of rows of the table), and then create a dict of Jagged arrays without the masked rows.

This is because I need to run functions that would produce new jagged arrays (or new columns, if we think about tables), which are not applicable on all rows but only on those for which the mask is True.

Additionally, I also realised that executing operations with the dict of arrays is faster than with the table, is it expected?

@jpivarski
Copy link
Member

You can do this:

{n: a[n] for n in a.columns}

for some jagged array a containing a Table. Each a[n] is a zero-cost projection through the structure (one of the advantages of a columnar format).

There are also MaskedArray techniques to produce filtered arrays that have the same length as the original, but maybe try this first.

@nfoppiani
Copy link
Author

It worked, thanks!
But I would love to know how to use MaskedArrays to assign a new column to a table, which exists only for a subset of the rows.
e.g. the function that computes the values of the new column can be computed only on a subset of the rows.

@jpivarski
Copy link
Member

If I remember right, ufuncs applied to a MaskedArray are only computed for the unmasked elements. They're less often used than jagged arrays and tables, but @nsmith- has had some success with them.

@nsmith-
Copy link
Contributor

nsmith- commented Oct 16, 2019

For example, this emits no RuntimeWarnings:

import awkward, numpy as np
a = np.random.normal(size=100)
ma = awkward.MaskedArray(a<0, a)
np.sqrt(ma)

However, it copies the (valid subset) array under-the-hood. An old issue is to use the (somewhat new) numpy ufunc where argument: scikit-hep/awkward-0.x#110

@jpivarski
Copy link
Member

I think this is done and can be closed. Let me know if I'm wrong!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants