-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunked w-gridder with dask-ms and ducc #204
Conversation
…ace for the out of memory gridder and the normal gridder
@landmanbester I'll wait for you to implement the degridding functionality before taking a closer look. Regarding the nested parallelism, I can see this is going to be a topic that needs be addressed in a paradigm where we're running code using OpenMP/TBB parallelism inside dask threads. Playing the devil's advocate: too many dask + OpenMP threads lead to oversubscription, while if we only use a single dask thread, then what's the point of using dask? The answer may lie in the middle. In past conversations with @JSKenyon, he mentioned a hybrid strategy of some dask threads and some OpenMP/TBB threads working well. Perhaps the same might apply to the NIFTY gridder? Some support for this kind of parallelism was implemented for the current NIFTY dask wrappers (below the new code you've added) where the user can ask for N gridding streams. This is not quite If Task Annotations makes it into dask, we'll have finer control of the scheduling of tasks. For example: creating a dedicated gridding thread or dedicated solver thread per dask worker on which only these operations may run. |
It most likely makes no difference, but I'd still like to point this out: the threads used within DUCC are plain POSIX threads, not OpenMP or TBB. Just so you are not surprised in case you are analysing the code with OpenMP-specific tools and don't find any parallelism ;-) |
Ah I see you've implemented your own ThreadPools and Schedulers in ducc0. |
I was using OpenMP originally and was quite happy with it. However, when Later on I saw Andre Offringa's approach to multi-threading in |
Well, this is why I want to discuss the implementation here. Currently, we are mainly chunking up the data because it doesn't fit into memory. Doesn't a parallel reduction always come with a linear increase in memory? This is exactly what I want to avoid. My reasons for using dask are twofold: Anyway, the wrapper around the degridder is doing something but it's not doing it correctly. I see threads spinning up and data does end up in the measurement set but it also grows by about a factor of 6 after running the predict. I suspect the logic here @sjperkins will appreciate it if you can have a look when you get a chance |
It seems to be linear in the number of threads, at maximum. In fact, if a parallel reduction was used to sum the dirties produced by each chunk, I'd expect to see a memory profile similar to that of shadems which also uses a parallel reduction: holoviz/datashader#926. In particular see this comment: ratt-ru/shadeMS#34 (comment) and ratt-ru/shadeMS#34 (comment). shadems plots are 1024**2, whereas it sounds like MeerKAT-sized grids are 8192**2. (but what about the internal w-planes ducc uses internally?) The shadems images were part of 3D cube with an image for each category, so I'm not even sure the data sizes are that different from each other. In any case, the visibility data dominated in my admittedly pen and paper exercises. I also want to very much emphasise that shadems and parallel gridding are doing exactly the same thing in terms of the distributed algorithm, except shadems is plotting vis data, while here it is gridded. Additionally, shadems must support more reduction cases because it supports plotting every axis against most other axes (Correct me here @o-smirnov).
Cool, the intention has been for dask-ms to be an easy interface for the algorithm writer. I think that dask Arrays are also easy for the algorithm writer to work with.
I agree. This is currently doable with the distributed Client interface, but it requires the user to manually place tasks on workers, rather than depending on the distributed scheduler to "do the right thing". As discussed in dask/dask#6217 (comment) I believe the more general approach is to annotate individual dask tasks with information that scheduler plugins (written by us) use to decide where to place tasks.
Sure, will take a look. |
As far as I know, the gridder's memory requirements do not scale with the number of w-stacks (please correct me if I am wrong, @mreineck). @sjperkins, if you have suggestions on how to improve the wrapper I am all ears. My basic design principles are: i) the disk in unavoidable when it comes the predict since the data don't fit into memory. But we can assume that the image does. |
That's correct. |
@landmanbester Just to recap our zoom conversation:
|
I am not sure what is going wrong here. Some tests fail when I run them through pytest but not when I run them manually. Will keep digging |
Do you use the |
Unofortunately Github refuses to show me the test logs ... |
@landmanbester The test cases are failing on the base install (without optional packages) and test. You need to guard your ducc0 imports and use the https://github.com/ska-sa/codex-africanus/blob/master/.github/workflows/ci.yml#L65-L72 https://github.com/ska-sa/codex-africanus/blob/master/.github/workflows/ci.yml#L65-L72 |
@mreineck Out of interest, can you see https://github.com/ska-sa/codex-africanus/pull/204/checks?check_run_id=971810643? And the following? https://github.com/ska-sa/codex-africanus/runs/971809780 I'm never sure what github shows to maintainers compared to those with public access... |
Yes, sorry. I wasn't expecting the automated PR tests to pass. I mainly pushed it so I can discuss it with @sjperkins during the meeting tomorrow. Is there a way to push changes without triggering the tests or would I have to create a separate branch? @mreineck I am using the master branch for now. I don't think it's anything in the gridder that's causing the tests to fail. They seem to fail on certain parametrisations using pytest but when I run the test manually with the same parameters they pass so I'm probably using @pmp wrong or something. |
Hi @sjperkins, I can open the pages you have linked, but as soon as I try to open the sub-sections (e.g. by clicking on "run base test suite") I get "We are currently unable to download the log. Please try again later.". So it looks more like a technical problem. |
Ahh, my fault ... my uMatrix was too paranoid. Sorry for the noise! |
Ah don't worry if they're failing: Its only important to get the green ticks before we press the merge button and the runs are short enough. You can put "[skip ci]" in your commits but that will only skip the "push" event. There's another "pull_request" event but for various reasons it just runs on every commit. |
Hmmm not sure what's going wrong. I just ran the tests locally with ducc0 from PYPI and they all pass. The failures are all due to a lack of expected relative precision (1.5e-5 as apposed to 1e-5 at single precision). Not a big deal, I'll just lower the precision but would be good to know whats going on |
My efforts to allow for odd pixel sizes were unsuccessful but this is low priority, I'll get to it eventually. I set some sensible defaults and changed the naming convention as discussed above. I removed the nu, nv options since these won't be available for long anyway. I think this is ready for a review when you have a sec @sjperkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. I've some minor comments and suggested some typo corrections.
Perhaps this can be a birthday PR ;-) |
Not completely, but I'm not aware of a bug fix since the last release. |
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
Co-authored-by: Simon Perkins <[email protected]>
I don't recall making the changes to africanus/util gridding/nifty but I've implemented the suggestions. I see numba is still pinned to <= 0.49 in setup.py. Maybe a result of merging in master when the perleypolyhedron gridder was in there. I take it we don't want to keep it as is? |
FYI: I just released |
I am trying to build a chunked up interface for the wgridder using dask-ms. I am mainly thinking about the functionality I need for pfb-clean but let's discuss here @sjperkins @o-smirnov @mreineck.
My thinking is to traverse the chunks in serial and leave all the parallelisation to the ducc.wgridder (derived from the nifty_gridder). This should be better for memory consumption and performance but let's see (interleaved io could be very cool here indeed). Anyway, I seem to have a working gridder (the degridder is tomorrow's task). I really expected this to be harder as there is a certain amount of nested parallelism involved. However, to my surprise, setting (so that the chunks are traversed in serial)
dask.config.set(pool=ThreadPool(1))
and passing in nthreads > 1 parameter to the wgridder, magically spins up all threads to 100% with all of them looking beautifully green. The dirty image looks like what I expect but I am yet to test that it agrees with the non-dask version. I would love some feedback on this. Please check out the docstrings in this class
https://github.com/landmanbester/codex-africanus/blob/f973bb149b03cf5652f6a6cb838d9327b6d2b4c9/africanus/gridding/nifty/dask.py#L94
where I have tried to describe the intended behaviour.