Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduction with a "2D" category #47

Open
sjperkins opened this issue Apr 28, 2020 · 10 comments
Open

Reduction with a "2D" category #47

sjperkins opened this issue Apr 28, 2020 · 10 comments
Labels
question Further information is requested

Comments

@sjperkins
Copy link
Member

sjperkins commented Apr 28, 2020

In the context of making images for each antenna and baseline, would it not be possible to treat these dimensions as an extra category dimension, along with the other categories you like to group the data by?

The intermediate and final image cubes will be large to huge, but it may be possible to render them in one go by:

  1. Warning the user about the danger of what they're attempting
  2. Having sufficient memory on the box for this tricky case.
  3. Flattening the categories and reshaping after the reduction to recover the antenna/baseline dimension
@sjperkins sjperkins added the question Further information is requested label Apr 28, 2020
@o-smirnov
Copy link
Collaborator

Do you mean like a colour-by field AND baseline? An outer product of sorts?

@sjperkins
Copy link
Member Author

I'm thinking of 2D (cat, ant) categories, which can be flattened into a (cat*ant,) array when provided to datashader. Hopefully, datashader will then produce a (x, y, cat*ant) array which can be reshaped into (x, y, cat, ant) before the final rendering step.

I was thinking back to the reduction code I wrote last year and realising that I could do it as a 3D cube reduction. So, this is predicated on the datashader reduction internals producing a 3D cube (x, y, cat). I may be incorrect as I don't think I've looked at a categorical reduction internally yet.

@sjperkins
Copy link
Member Author

sjperkins commented Apr 28, 2020

Do you mean like a colour-by field AND baseline? An outer product of sorts?

So not colouring by both field and baseline. Grouping by field and baseline, but colouring by field only.

Sorry, I don't think I'm explaining very clearly. I'm thinking of this in code which might explain it better.

Disregarding colouring for the sake of argument, could you specify antenna as a category when doing a datashader reduction? Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

@o-smirnov
Copy link
Collaborator

Yep, you can categorize by any axis (including data axes, in which case it discretizes them into bins). Try --axis freq --yaxis DATA:amp --caxis ANTENNA1, or --axis freq --yaxis DATA:amp --caxis DATA:phase

Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

Exactly.

@o-smirnov
Copy link
Collaborator

What is useful is colouring by e.g. field and correlation. Both "low-valued" categories, so it's feasible.

We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!)

@sjperkins
Copy link
Member Author

sjperkins commented Apr 28, 2020

Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

Exactly.

We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!)

OK, then technically I think its possible to render reduce all ants (or baselines) in one datashader call, but it'll probably be expensive in terms of memory.

One may have to drop the number of workers and let them chew on large numpy arrays.

@IanHeywood
Copy link
Collaborator

We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!)

I always intended to have an iterate-by-baseline option, with the idea that the user would use it in conjunction with an antenna selection. I use surfvis.py for this quite regularly (plotting time/frequency plots for all baselines to a specific antenna).

@sjperkins
Copy link
Member Author

sjperkins commented Apr 28, 2020

Disregarding colouring for the sake of argument, could you specify antenna as a category when doing a datashader reduction? Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade?

Exactly.

Apologies again because I feel I'm being fuzzy. For my own clarity around my suggestion, it should be possible to use categories to group antenna's and baselines and reduce all of them in a single, memory-hungry canvas.points call, rather than setting up separate canvas.points calls for each antenna or baseline.

Once canvas.points has finished, the cube could then be sliced and it may be possible to send the individual slices to tf.shade for rendering.

It should additionally be possible with some flattening/reshaping trickery to combine grouping by antenna/baseline with actual categories.

Hereendeththelesson.

@o-smirnov
Copy link
Collaborator

Ah, now I understand what you're getting at. Now that is very clever!

So instead of creating a separate dataframe per iterable, as we do now, we concatenate everything into one big dataframe, with a category column that's an outer product of iterable times what we actually categorize by... and then carve up the canvas cube into slices to be rendered.

Indeed, I don't see why that shouldn't work... and the only overhead is keeping a larger canvas in memory -- which is not that much -- and happens in parallel mode anyway.

Grouping is then actually unnecessary, and even counterproductive. You want the rows of the dataframe to follow the natural MS order, for best performance. You can then indeed render all plots with a single pass through the MS.

I can see one limitation to this approach in terms of functionality. Because we reduce to a cube, the X and Y plot limits become fixed across all iterables (whereas now they can be auto-sized on a per-iterable basis). For iterating by scan, this is actually good. For something like --iter-field, this may be undesirable. So I think we'd still need to keep the old slow way as an option.

@sjperkins
Copy link
Member Author

So instead of creating a separate dataframe per iterable, as we do now, we concatenate everything into one big dataframe, with a category column that's an outer product of iterable times what we actually categorize by... and then carve up the canvas cube into slices to be rendered.

Exactly, although the iterables I was considering here were antenna and baseline.

Grouping is then actually unnecessary, and even counterproductive. You want the rows of the dataframe to follow the natural MS order, for best performance. You can then indeed render all plots with a single pass through the MS.

I think it depends if the grouping follows the natural MS ordering which, in the MeerKAT case is monotically increasing TIME. Grouping by columns such as FIELD_ID and SCAN_NUMBER which tend to preserve this ordering, and by implication row contiguity, are fine.

ANTENNA1 and ANTENNA2 are striped throughout a TIME ordered MS so grouping by them will produce non-contiguous access when grouped This approach should do better in terms of disk access, at the cost of extra memory.

I can see one limitation to this approach in terms of functionality. Because we reduce to a cube, the X and Y plot limits become fixed across all iterables (whereas now they can be auto-sized on a per-iterable basis). For iterating by scan, this is actually good. For something like --iter-field, this may be undesirable. So I think we'd still need to keep the old slow way as an option.

I see. In terms of an extended stretch goal, note that its perfectly possible to structure the dask reduction to produce a dict of numpy arrays, or just about anything. Whether datashader will accept such unorthodoxy in it's API is another matter...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants