-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduction with a "2D" category #47
Comments
Do you mean like a colour-by field AND baseline? An outer product of sorts? |
I'm thinking of 2D I was thinking back to the reduction code I wrote last year and realising that I could do it as a 3D cube reduction. So, this is predicated on the datashader reduction internals producing a 3D cube |
So not colouring by both field and baseline. Grouping by field and baseline, but colouring by field only. Sorry, I don't think I'm explaining very clearly. I'm thinking of this in code which might explain it better. Disregarding colouring for the sake of argument, could you specify antenna as a category when doing a datashader reduction? Would canvas.points return it as a 3D cube which would then be rendered into a 2D image by tf.shade? |
Yep, you can categorize by any axis (including data axes, in which case it discretizes them into bins). Try
Exactly. |
What is useful is colouring by e.g. field and correlation. Both "low-valued" categories, so it's feasible. We haven't done grouping (i.e. iterating) by baseline, yet, but it could be useful (as long as the user kept to a subset of baselines... nobody looks at 2016 plots!) |
OK, then technically I think its possible to One may have to drop the number of workers and let them chew on large numpy arrays. |
I always intended to have an iterate-by-baseline option, with the idea that the user would use it in conjunction with an antenna selection. I use surfvis.py for this quite regularly (plotting time/frequency plots for all baselines to a specific antenna). |
Apologies again because I feel I'm being fuzzy. For my own clarity around my suggestion, it should be possible to use categories to group antenna's and baselines and reduce all of them in a single, memory-hungry canvas.points call, rather than setting up separate canvas.points calls for each antenna or baseline. Once canvas.points has finished, the cube could then be sliced and it may be possible to send the individual slices to tf.shade for rendering. It should additionally be possible with some flattening/reshaping trickery to combine grouping by antenna/baseline with actual categories. Hereendeththelesson. |
Ah, now I understand what you're getting at. Now that is very clever! So instead of creating a separate dataframe per iterable, as we do now, we concatenate everything into one big dataframe, with a category column that's an outer product of iterable times what we actually categorize by... and then carve up the canvas cube into slices to be rendered. Indeed, I don't see why that shouldn't work... and the only overhead is keeping a larger canvas in memory -- which is not that much -- and happens in parallel mode anyway. Grouping is then actually unnecessary, and even counterproductive. You want the rows of the dataframe to follow the natural MS order, for best performance. You can then indeed render all plots with a single pass through the MS. I can see one limitation to this approach in terms of functionality. Because we reduce to a cube, the X and Y plot limits become fixed across all iterables (whereas now they can be auto-sized on a per-iterable basis). For iterating by scan, this is actually good. For something like |
Exactly, although the iterables I was considering here were antenna and baseline.
I think it depends if the grouping follows the natural MS ordering which, in the MeerKAT case is monotically increasing TIME. Grouping by columns such as FIELD_ID and SCAN_NUMBER which tend to preserve this ordering, and by implication row contiguity, are fine. ANTENNA1 and ANTENNA2 are striped throughout a TIME ordered MS so grouping by them will produce non-contiguous access when grouped This approach should do better in terms of disk access, at the cost of extra memory.
I see. In terms of an extended stretch goal, note that its perfectly possible to structure the dask reduction to produce a dict of numpy arrays, or just about anything. Whether datashader will accept such unorthodoxy in it's API is another matter... |
In the context of making images for each antenna and baseline, would it not be possible to treat these dimensions as an extra category dimension, along with the other categories you like to group the data by?
The intermediate and final image cubes will be large to huge, but it may be possible to render them in one go by:
The text was updated successfully, but these errors were encountered: