Reduction with a "2D" category #47

sjperkins · 2020-04-28T13:29:43Z

In the context of making images for each antenna and baseline, would it not be possible to treat these dimensions as an extra category dimension, along with the other categories you like to group the data by?

The intermediate and final image cubes will be large to huge, but it may be possible to render them in one go by:

Warning the user about the danger of what they're attempting
Having sufficient memory on the box for this tricky case.
Flattening the categories and reshaping after the reduction to recover the antenna/baseline dimension

o-smirnov · 2020-04-28T15:16:47Z

Do you mean like a colour-by field AND baseline? An outer product of sorts?

sjperkins · 2020-04-28T15:24:18Z

I'm thinking of 2D (cat, ant) categories, which can be flattened into a (cat*ant,) array when provided to datashader. Hopefully, datashader will then produce a (x, y, cat*ant) array which can be reshaped into (x, y, cat, ant) before the final rendering step.

I was thinking back to the reduction code I wrote last year and realising that I could do it as a 3D cube reduction. So, this is predicated on the datashader reduction internals producing a 3D cube (x, y, cat). I may be incorrect as I don't think I've looked at a categorical reduction internally yet.

sjperkins · 2020-04-28T15:33:22Z

Do you mean like a colour-by field AND baseline? An outer product of sorts?

So not colouring by both field and baseline. Grouping by field and baseline, but colouring by field only.

Sorry, I don't think I'm explaining very clearly. I'm thinking of this in code which might explain it better.

Disregarding colouring for the sake of argument, could you specify antenna as a category when doing a datashader reduction? Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

o-smirnov · 2020-04-28T15:39:51Z

Yep, you can categorize by any axis (including data axes, in which case it discretizes them into bins). Try --axis freq --yaxis DATA:amp --caxis ANTENNA1, or --axis freq --yaxis DATA:amp --caxis DATA:phase

Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

Exactly.

o-smirnov · 2020-04-28T15:42:02Z

What is useful is colouring by e.g. field and correlation. Both "low-valued" categories, so it's feasible.

We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!)

sjperkins · 2020-04-28T15:48:14Z

Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

Exactly.

We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!)

OK, then technically I think its possible to ~~render~~ reduce all ants (or baselines) in one datashader call, but it'll probably be expensive in terms of memory.

One may have to drop the number of workers and let them chew on large numpy arrays.

IanHeywood · 2020-04-28T16:00:27Z

We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!)

I always intended to have an iterate-by-baseline option, with the idea that the user would use it in conjunction with an antenna selection. I use surfvis.py for this quite regularly (plotting time/frequency plots for all baselines to a specific antenna).

sjperkins · 2020-04-28T16:08:37Z

Disregarding colouring for the sake of argument, could you specify antenna as a category when doing a datashader reduction? Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

Exactly.

Apologies again because I feel I'm being fuzzy. For my own clarity around my suggestion, it should be possible to use categories to group antenna's and baselines and reduce all of them in a single, memory-hungry canvas.points call, rather than setting up separate canvas.points calls for each antenna or baseline.

Once canvas.points has finished, the cube could then be sliced and it may be possible to send the individual slices to tf.shade for rendering.

It should additionally be possible with some flattening/reshaping trickery to combine grouping by antenna/baseline with actual categories.

Hereendeththelesson.

o-smirnov · 2020-04-28T16:28:30Z

Ah, now I understand what you're getting at. Now that is very clever!

So instead of creating a separate dataframe per iterable, as we do now, we concatenate everything into one big dataframe, with a category column that's an outer product of iterable times what we actually categorize by... and then carve up the canvas cube into slices to be rendered.

Indeed, I don't see why that shouldn't work... and the only overhead is keeping a larger canvas in memory -- which is not that much -- and happens in parallel mode anyway.

Grouping is then actually unnecessary, and even counterproductive. You want the rows of the dataframe to follow the natural MS order, for best performance. You can then indeed render all plots with a single pass through the MS.

I can see one limitation to this approach in terms of functionality. Because we reduce to a cube, the X and Y plot limits become fixed across all iterables (whereas now they can be auto-sized on a per-iterable basis). For iterating by scan, this is actually good. For something like --iter-field, this may be undesirable. So I think we'd still need to keep the old slow way as an option.

sjperkins · 2020-04-29T08:55:34Z

So instead of creating a separate dataframe per iterable, as we do now, we concatenate everything into one big dataframe, with a category column that's an outer product of iterable times what we actually categorize by... and then carve up the canvas cube into slices to be rendered.

Exactly, although the iterables I was considering here were antenna and baseline.

Grouping is then actually unnecessary, and even counterproductive. You want the rows of the dataframe to follow the natural MS order, for best performance. You can then indeed render all plots with a single pass through the MS.

I think it depends if the grouping follows the natural MS ordering which, in the MeerKAT case is monotically increasing TIME. Grouping by columns such as FIELD_ID and SCAN_NUMBER which tend to preserve this ordering, and by implication row contiguity, are fine.

ANTENNA1 and ANTENNA2 are striped throughout a TIME ordered MS so grouping by them will produce non-contiguous access when grouped This approach should do better in terms of disk access, at the cost of extra memory.

I can see one limitation to this approach in terms of functionality. Because we reduce to a cube, the X and Y plot limits become fixed across all iterables (whereas now they can be auto-sized on a per-iterable basis). For iterating by scan, this is actually good. For something like --iter-field, this may be undesirable. So I think we'd still need to keep the old slow way as an option.

I see. In terms of an extended stretch goal, note that its perfectly possible to structure the dask reduction to produce a dict of numpy arrays, or just about anything. Whether datashader will accept such unorthodoxy in it's API is another matter...

sjperkins added the question Further information is requested label Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction with a "2D" category #47

Reduction with a "2D" category #47

sjperkins commented Apr 28, 2020 •

edited

Loading

o-smirnov commented Apr 28, 2020

sjperkins commented Apr 28, 2020

sjperkins commented Apr 28, 2020 •

edited

Loading

o-smirnov commented Apr 28, 2020

o-smirnov commented Apr 28, 2020

sjperkins commented Apr 28, 2020 •

edited

Loading

IanHeywood commented Apr 28, 2020

sjperkins commented Apr 28, 2020 •

edited

Loading

o-smirnov commented Apr 28, 2020

sjperkins commented Apr 29, 2020

Reduction with a "2D" category #47

Reduction with a "2D" category #47

Comments

sjperkins commented Apr 28, 2020 • edited Loading

o-smirnov commented Apr 28, 2020

sjperkins commented Apr 28, 2020

sjperkins commented Apr 28, 2020 • edited Loading

o-smirnov commented Apr 28, 2020

o-smirnov commented Apr 28, 2020

sjperkins commented Apr 28, 2020 • edited Loading

IanHeywood commented Apr 28, 2020

sjperkins commented Apr 28, 2020 • edited Loading

o-smirnov commented Apr 28, 2020

sjperkins commented Apr 29, 2020

sjperkins commented Apr 28, 2020 •

edited

Loading

sjperkins commented Apr 28, 2020 •

edited

Loading

sjperkins commented Apr 28, 2020 •

edited

Loading

sjperkins commented Apr 28, 2020 •

edited

Loading