Chunked w-gridder with dask-ms and ducc #204

landmanbester · 2020-06-24T15:34:19Z

I am trying to build a chunked up interface for the wgridder using dask-ms. I am mainly thinking about the functionality I need for pfb-clean but let's discuss here @sjperkins @o-smirnov @mreineck.

My thinking is to traverse the chunks in serial and leave all the parallelisation to the ducc.wgridder (derived from the nifty_gridder). This should be better for memory consumption and performance but let's see (interleaved io could be very cool here indeed). Anyway, I seem to have a working gridder (the degridder is tomorrow's task). I really expected this to be harder as there is a certain amount of nested parallelism involved. However, to my surprise, setting (so that the chunks are traversed in serial)

dask.config.set(pool=ThreadPool(1))

and passing in nthreads > 1 parameter to the wgridder, magically spins up all threads to 100% with all of them looking beautifully green. The dirty image looks like what I expect but I am yet to test that it agrees with the non-dask version. I would love some feedback on this. Please check out the docstrings in this class

https://github.com/landmanbester/codex-africanus/blob/f973bb149b03cf5652f6a6cb838d9327b6d2b4c9/africanus/gridding/nifty/dask.py#L94

where I have tried to describe the intended behaviour.

…ace for the out of memory gridder and the normal gridder

sjperkins · 2020-06-25T12:48:54Z

@landmanbester I'll wait for you to implement the degridding functionality before taking a closer look.

Regarding the nested parallelism, I can see this is going to be a topic that needs be addressed in a paradigm where we're running code using OpenMP/TBB parallelism inside dask threads.

Playing the devil's advocate: too many dask + OpenMP threads lead to oversubscription, while if we only use a single dask thread, then what's the point of using dask?

The answer may lie in the middle. In past conversations with @JSKenyon, he mentioned a hybrid strategy of some dask threads and some OpenMP/TBB threads working well. Perhaps the same might apply to the NIFTY gridder? Some support for this kind of parallelism was implemented for the current NIFTY dask wrappers (below the new code you've added) where the user can ask for N gridding streams.

This is not quite ThreadPool(N): it creates the graph in such a way that there are N serial gridding chains rather than gridding each visibility chunk separately and summing them together in a Parallel Reduction

If Task Annotations makes it into dask, we'll have finer control of the scheduling of tasks. For example: creating a dedicated gridding thread or dedicated solver thread per dask worker on which only these operations may run.

mreineck · 2020-06-25T13:20:45Z

It most likely makes no difference, but I'd still like to point this out: the threads used within DUCC are plain POSIX threads, not OpenMP or TBB. Just so you are not surprised in case you are analysing the code with OpenMP-specific tools and don't find any parallelism ;-)

sjperkins · 2020-06-25T14:00:41Z

It most likely makes no difference, but I'd still like to point this out: the threads used within DUCC are plain POSIX threads, not OpenMP or TBB. Just so you are not surprised in case you are analysing the code with OpenMP-specific tools and don't find any parallelism ;-)

Ah I see you've implemented your own ThreadPools and Schedulers in ducc0.

mreineck · 2020-06-25T14:10:36Z

I was using OpenMP originally and was quite happy with it. However, when pypocketfft was integrated into scipy, we needed to ditch OpenMP since it does not always interact well with Python's own multithreading (no idea which side is to blame here). Peter Bell, who has helped me a lot with pypocketfft provided a pthread-based implementation which worked quite well, but originally only had "static" scheduling.

Later on I saw Andre Offringa's approach to multi-threading in wsclean, and from those two starting points I managed to put something together which is almost as powerful as OpenMP, but relies only on standard C++ features. I'm now using it everywhere and so far didn't encounter any problems.

landmanbester · 2020-06-25T16:27:24Z

Well, this is why I want to discuss the implementation here. Currently, we are mainly chunking up the data because it doesn't fit into memory. Doesn't a parallel reduction always come with a linear increase in memory? This is exactly what I want to avoid.

My reasons for using dask are twofold:
i) dask-ms is a very neat and useful interface to the measurement set that handles chunking in a way that makes the code far more transparent than it would be if there was clunky hand-written chunking logic involved. Truth be told I don't even know how to achieve the same thing using pyrap directly.
ii) I really hope we are moving towards a distributed gridding implementation that completely avoids the disk (as would be required for resolve type algorithms) and I think dask distributed is the way to go.

Anyway, the wrapper around the degridder is doing something but it's not doing it correctly. I see threads spinning up and data does end up in the measurement set but it also grows by about a factor of 6 after running the predict. I suspect the logic here

https://github.com/landmanbester/codex-africanus/blob/b67a29229f1dd36bcf4a8f25cf8489ef8efabd7c/africanus/gridding/nifty/dask.py#L400

@sjperkins will appreciate it if you can have a look when you get a chance

sjperkins · 2020-06-25T17:29:58Z

Well, this is why I want to discuss the implementation here. Currently, we are mainly chunking up the data because it doesn't fit into memory. Doesn't a parallel reduction always come with a linear increase in memory? This is exactly what I want to avoid.

It seems to be linear in the number of threads, at maximum. In fact, if a parallel reduction was used to sum the dirties produced by each chunk, I'd expect to see a memory profile similar to that of shadems which also uses a parallel reduction: holoviz/datashader#926. In particular see this comment: ratt-ru/shadeMS#34 (comment) and ratt-ru/shadeMS#34 (comment). shadems plots are 1024**2, whereas it sounds like MeerKAT-sized grids are 8192**2. (but what about the internal w-planes ducc uses internally?)

The shadems images were part of 3D cube with an image for each category, so I'm not even sure the data sizes are that different from each other. In any case, the visibility data dominated in my admittedly pen and paper exercises.

I also want to very much emphasise that shadems and parallel gridding are doing exactly the same thing in terms of the distributed algorithm, except shadems is plotting vis data, while here it is gridded. Additionally, shadems must support more reduction cases because it supports plotting every axis against most other axes (Correct me here @o-smirnov).

My reasons for using dask are twofold:
i) dask-ms is a very neat and useful interface to the measurement set that handles chunking in a way that makes the code far more transparent than it would be if there was clunky hand-written chunking logic involved. Truth be told I don't even know how to achieve the same thing using pyrap directly.

Cool, the intention has been for dask-ms to be an easy interface for the algorithm writer. I think that dask Arrays are also easy for the algorithm writer to work with.

ii) I really hope we are moving towards a distributed gridding implementation that completely avoids the disk (as would be required for resolve type algorithms) and I think dask distributed is the way to go.

I agree. This is currently doable with the distributed Client interface, but it requires the user to manually place tasks on workers, rather than depending on the distributed scheduler to "do the right thing". As discussed in dask/dask#6217 (comment) I believe the more general approach is to annotate individual dask tasks with information that scheduler plugins (written by us) use to decide where to place tasks.

Anyway, the wrapper around the degridder is doing something but it's not doing it correctly. I see threads spinning up and data does end up in the measurement set but it also grows by about a factor of 6 after running the predict. I suspect the logic here

https://github.com/landmanbester/codex-africanus/blob/b67a29229f1dd36bcf4a8f25cf8489ef8efabd7c/africanus/gridding/nifty/dask.py#L400

@sjperkins will appreciate it if you can have a look when you get a chance

Sure, will take a look.

landmanbester · 2020-06-25T19:32:57Z

As far as I know, the gridder's memory requirements do not scale with the number of w-stacks (please correct me if I am wrong, @mreineck). @sjperkins, if you have suggestions on how to improve the wrapper I am all ears. My basic design principles are:

i) the disk in unavoidable when it comes the predict since the data don't fit into memory. But we can assume that the image does.
ii) I need an operator that produces a residual image and a chi-square value by touching the disk as little as possible. I don't need to save these intermediate visibilities to disk (in place subtraction, read but no write) and I need to apply this operator at least O(10) times.
iii) I also need a way to write MODEL_DATA and RESIDUAL_VIS to the MS but I only need to do that once so speed is less of an issue

mreineck · 2020-06-25T22:34:33Z

As far as I know, the gridder's memory requirements do not scale with the number of w-stacks (please correct me if I am wrong, @mreineck).

That's correct.

sjperkins · 2020-06-26T12:13:40Z

@landmanbester Just to recap our zoom conversation:

The paradigm we're working with is that dask Arrays are the interface between the data source/sink and the algorithm.
For this reason, we shouldn't explicitly include dask-ms logic within codex-africanus algorithms.
I'm happy to explore the idea of a wgridder object with dot and hdot methods that express a Linear Operator. e.g. pylops-distributed seems to be taking this approach. However, the Linear Operator methods should take dask Arrays as arguments and return dask Arrays.
Otherwise, the ms2dirty and dirty2ms ducc methods can be wrapped with dask.array.blockwise calls representing dot and hdot for the gridder/degridder.

landmanbester · 2020-08-11T15:11:25Z

I am not sure what is going wrong here. Some tests fail when I run them through pytest but not when I run them manually. Will keep digging

mreineck · 2020-08-11T15:20:43Z

Do you use the auto_choice branch in CI? The ducc0 branch will fail if you feed it nu=nv=0.

mreineck · 2020-08-11T15:21:19Z

Unofortunately Github refuses to show me the test logs ...

sjperkins · 2020-08-11T15:31:53Z

@landmanbester The test cases are failing on the base install (without optional packages) and test. You need to guard your ducc0 imports and use the @requires_optional decorator. I'd imagine you'd also need to configure the github workflow to install ducc0 for the complete install and test.

https://github.com/ska-sa/codex-africanus/blob/master/.github/workflows/ci.yml#L65-L72

sjperkins · 2020-08-11T15:45:21Z

Unofortunately Github refuses to show me the test logs ...

@mreineck Out of interest, can you see https://github.com/ska-sa/codex-africanus/pull/204/checks?check_run_id=971810643?

And the following?

https://github.com/ska-sa/codex-africanus/runs/971809780

I'm never sure what github shows to maintainers compared to those with public access...

landmanbester · 2020-08-11T15:52:57Z

Yes, sorry. I wasn't expecting the automated PR tests to pass. I mainly pushed it so I can discuss it with @sjperkins during the meeting tomorrow. Is there a way to push changes without triggering the tests or would I have to create a separate branch?

@mreineck I am using the master branch for now. I don't think it's anything in the gridder that's causing the tests to fail. They seem to fail on certain parametrisations using pytest but when I run the test manually with the same parameters they pass so I'm probably using @pmp wrong or something.

mreineck · 2020-08-11T16:02:12Z

Hi @sjperkins, I can open the pages you have linked, but as soon as I try to open the sub-sections (e.g. by clicking on "run base test suite") I get "We are currently unable to download the log. Please try again later.". So it looks more like a technical problem.

mreineck · 2020-08-11T16:03:10Z

Ahh, my fault ... my uMatrix was too paranoid. Sorry for the noise!

sjperkins · 2020-08-11T16:09:01Z

Yes, sorry. I wasn't expecting the automated PR tests to pass. I mainly pushed it so I can discuss it with @sjperkins during the meeting tomorrow. Is there a way to push changes without triggering the tests or would I have to create a separate branch?

Ah don't worry if they're failing: Its only important to get the green ticks before we press the merge button and the runs are short enough.

You can put "[skip ci]" in your commits but that will only skip the "push" event. There's another "pull_request" event but for various reasons it just runs on every commit.

landmanbester · 2020-09-04T15:27:38Z

Hmmm not sure what's going wrong. I just ran the tests locally with ducc0 from PYPI and they all pass. The failures are all due to a lack of expected relative precision (1.5e-5 as apposed to 1e-5 at single precision). Not a big deal, I'll just lower the precision but would be good to know whats going on

landmanbester · 2020-09-04T15:57:43Z

My efforts to allow for odd pixel sizes were unsuccessful but this is low priority, I'll get to it eventually. I set some sensible defaults and changed the naming convention as discussed above. I removed the nu, nv options since these won't be available for long anyway. I think this is ready for a review when you have a sec @sjperkins

sjperkins

Looks very good. I've some minor comments and suggested some typo corrections.

africanus/gridding/wgridder/dask.py

africanus/gridding/wgridder/im2vis.py

africanus/gridding/nifty/dask.py

africanus/gridding/wgridder/im2residim.py

africanus/util/dask_util.py

africanus/util/testing.py

sjperkins · 2020-09-04T16:17:32Z

Perhaps this can be a birthday PR ;-)

mreineck · 2020-09-04T16:36:39Z

That is a bit strange. These tests all pass locally on the ducc0 branch. Is the PYPI release up to date @mreineck?

Not completely, but I'm not aware of a bug fix since the last release.
I'll try to make another release soon, but before that I have to apply another patch that fixes a rare source of deadlocks with the new thread pool implementation. Might take a few days.

Co-authored-by: Simon Perkins <[email protected]>

landmanbester · 2020-09-05T14:08:09Z

I don't recall making the changes to africanus/util gridding/nifty but I've implemented the suggestions. I see numba is still pinned to <= 0.49 in setup.py. Maybe a result of merging in master when the perleypolyhedron gridder was in there. I take it we don't want to keep it as is?

mreineck · 2020-09-16T16:11:04Z

FYI: I just released ducc0 0.6, which contains the promised memory reductions; it's a bit faster as well!

sjperkins · 2020-09-17T12:07:46Z

FYI: I just released ducc0 0.6, which contains the promised memory reductions; it's a bit faster as well!

Thanks @mreineck. I've opened #221 to track this.

landmanbester added 3 commits June 12, 2020 13:56

Start wrapping wgridder

da8813a

Started wrapping wgridder. Giving up on idea to make a unified interf…

3dc7122

…ace for the out of memory gridder and the normal gridder

wgridder hdot seems to be producing correct dirty image

f973bb1

Weirdness in degridder writes

b67a292

landmanbester added 4 commits July 7, 2020 10:38

Begin refactor, split wgridder from nifty directory

c4ba673

Refactor

6fab39d

Setting up tests

d66cfec

Added tests for numpy wrapper around wgridder funcs

f7bbd7c

sjperkins mentioned this pull request Jul 22, 2020

Prefer scipy.fft to numpy.fft #211

Open

Weirdness with test failures

e3588eb

landmanbester added 2 commits August 12, 2020 14:09

Demonstrate failure when processing multiple chunks in parallel

175502e

Upgraded to latest ducc0 on pip. Issue persists

e3fcd1c

landmanbester added 2 commits September 4, 2020 17:28

Change test eps

2de61da

Fixed docs

8b1f5ab

sjperkins requested changes Sep 4, 2020

View reviewed changes

Landman Bester and others added 12 commits September 5, 2020 15:42

Update africanus/gridding/wgridder/im2vis.py

a1ac0c8

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/wgridder/vis2im.py

d711aba

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/nifty/dask.py

be182d7

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/wgridder/im2residim.py

b5b293f

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/util/dask_util.py

ff0f062

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/util/dask_util.py

9ec43c0

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/util/testing.py

365059f

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/wgridder/vis2im.py

22f7622

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/wgridder/vis2im.py

906f675

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/wgridder/vis2im.py

2b56a55

Co-authored-by: Simon Perkins <[email protected]>

Update africanus/gridding/wgridder/vis2im.py

078e7d3

Co-authored-by: Simon Perkins <[email protected]>

Implement @sjperkins suggestions

a06bf07

Update setup.py

8ba97a5

sjperkins changed the title ~~WIP: chunked w-gridder with dask-ms and ducc~~ Chunked w-gridder with dask-ms and ducc Sep 6, 2020

Merge branch 'master' into wgridder

479de37

sjperkins approved these changes Sep 6, 2020

View reviewed changes

sjperkins merged commit 05407c7 into ratt-ru:master Sep 6, 2020

landmanbester deleted the wgridder branch September 7, 2020 09:06

landmanbester mentioned this pull request Sep 7, 2020

Add documentation for weighting options in gridder and degridder #44

Closed

sjperkins mentioned this pull request Sep 17, 2020

Upgrade to latest version of ducc #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunked w-gridder with dask-ms and ducc #204

Chunked w-gridder with dask-ms and ducc #204

landmanbester commented Jun 24, 2020

sjperkins commented Jun 25, 2020

mreineck commented Jun 25, 2020

sjperkins commented Jun 25, 2020

mreineck commented Jun 25, 2020

landmanbester commented Jun 25, 2020

sjperkins commented Jun 25, 2020 •

edited

Loading

landmanbester commented Jun 25, 2020

mreineck commented Jun 25, 2020

sjperkins commented Jun 26, 2020

landmanbester commented Aug 11, 2020

mreineck commented Aug 11, 2020

mreineck commented Aug 11, 2020

sjperkins commented Aug 11, 2020

sjperkins commented Aug 11, 2020

landmanbester commented Aug 11, 2020

mreineck commented Aug 11, 2020

mreineck commented Aug 11, 2020

sjperkins commented Aug 11, 2020

landmanbester commented Sep 4, 2020

landmanbester commented Sep 4, 2020

sjperkins left a comment

sjperkins commented Sep 4, 2020

mreineck commented Sep 4, 2020

landmanbester commented Sep 5, 2020

mreineck commented Sep 16, 2020

sjperkins commented Sep 17, 2020

Chunked w-gridder with dask-ms and ducc #204

Chunked w-gridder with dask-ms and ducc #204

Conversation

landmanbester commented Jun 24, 2020

sjperkins commented Jun 25, 2020

mreineck commented Jun 25, 2020

sjperkins commented Jun 25, 2020

mreineck commented Jun 25, 2020

landmanbester commented Jun 25, 2020

sjperkins commented Jun 25, 2020 • edited Loading

landmanbester commented Jun 25, 2020

mreineck commented Jun 25, 2020

sjperkins commented Jun 26, 2020

landmanbester commented Aug 11, 2020

mreineck commented Aug 11, 2020

mreineck commented Aug 11, 2020

sjperkins commented Aug 11, 2020

sjperkins commented Aug 11, 2020

landmanbester commented Aug 11, 2020

mreineck commented Aug 11, 2020

mreineck commented Aug 11, 2020

sjperkins commented Aug 11, 2020

landmanbester commented Sep 4, 2020

landmanbester commented Sep 4, 2020

sjperkins left a comment

Choose a reason for hiding this comment

sjperkins commented Sep 4, 2020

mreineck commented Sep 4, 2020

landmanbester commented Sep 5, 2020

mreineck commented Sep 16, 2020

sjperkins commented Sep 17, 2020

sjperkins commented Jun 25, 2020 •

edited

Loading