Wrapping labels over array boundaries #344

Holmgren825 · 2023-10-11T15:40:48Z

I open this to get the discussion started on my hacky implementation of wrapping labels over array boundaries, as mentioned in #189. Currently, it is possible to wrap labels across one, or both, boundaries of a 2d array. This is relatively simple, since one can treat the “wrap” as another face in the label_adjacency_graph. However, there are some obvious things that need to be solved before merged:

How should wrapping be done in higher dimensions? I guess this could work in the same way for 3d, but nd?
Tests are crude. Since there is no reference implementation in SciPy, I've basically just hard coded some arrays now.

There are probably more things to figure out.

…and has no axis selection.

GPUtester · 2023-10-11T15:40:51Z

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

jni · 2023-10-12T02:06:12Z

Hello!

Converting 2D to nD is my bread and butter. 😃

From an API perspective, rather than a string, we should do (imho), wrap_axes=None, and then you can pass in an optional tuple of int for the axes along which you want to wrap.

Then, instead of:

    if wrap in ["0", "both"]:
        faces.append(da.hstack([labels[:, [-1]], labels[:, [0]]]))

You would do:

# lil' helper function
def set_tup_value(tup, idx, value):
    """Return a copy of `tup` with `value` at `idx`."""
    return tuple((elem if i != idx else value) for i, elem in enumerate(tup))

...

for ax in wrap_axes:
    none_slice = (slice(None),) * labels.ndim
    sl_back = set_tup_value(none_slice, ax, [-1])
    sl_front = set_tup_value(none_slice, ax, [0])
    faces.append(da.stack([labels[sl_back], labels[sl_front]], axis=ax)

The only downside is that this won't wrap around corners, ie I think the code above suffers from #320 at the corners. That seems like a super tricky fix so I'd personally be happy to punt on that to a later PR, as it seems pretty niche.

As a clue to a fix, I think that if you have multiple axes and you have a structure with connectivity>1, you would need to examine the wrapped faces from earlier axes and add wrapped faces to those faces. It gets confusing. 😅 But again, I'd do this in a later PR, and perhaps add a known failing test with pytest.mark.xfail for:

dask_image.label(
        np.array([[1, 0, 0], [0, 0, 0], [0, 0, 1]]),
        structure=np.ones((3, 3)),
        wrap_axes=(0, 1),
        )

Holmgren825 · 2023-10-12T07:45:35Z

Awesome, thanks @jni! I threw a .squeeze on da.stack since this ended up with an extra dimension, which won't be compatible with the structure.

Holmgren825 · 2023-10-12T07:49:18Z

dask_image/ndmeasure/_utils/_label.py

-def label_adjacency_graph(labels, structure, nlabels, wrap=None):
+def set_tup_value(tup, idx, value):
+    """Return a copy of `tup` with `value` at `idx`."""
+    return tuple((elem if i == idx else value) for i, elem in enumerate(tup))


Note that I changed this to ==. I guess this comes down to how one thinks about wrapping. As it is now, setting wrap_axes=(0) wraps labels across the boundary of the 0th axis, whereas != would wrap the array over the 0th axis and wrap labels across the boundary of the 1st axis. Could use either one really, depends on which is more intuitive I guess.

Hmm I'm a bit confused here. To me, indicating the axes over which dask_image.ndmeasure.label should consider the input array to wrap over is most intuitive :) (which is what I think this code does)

Yes, I agree. Just wanted to comment it since it differs from the suggestion of @jni.

The suggestion from @jni was pretty abstract and untested and he well could have gotten a sign wrong somewhere 😅 Thanks @Holmgren825 and @m-albert! Sorry the next few weeks are very busy for me so I may not be able to do an in-depth review, but I'll try. Please ping me if there is a rush/you specifically want an extra pair of eyes on something.

To me, indicating the axes over which dask_image.ndmeasure.label should consider the input array to wrap over is most intuitive :) (which is what I think this code does)

I had been wrong earlier in the sense that the code above should be saying != as jni suggested earlier. Because the idea is to only replace the element at index idx and leave other elements unchanged (those with i != idx).

Sorry the next few weeks are very busy for me so I may not be able to do an in-depth review, but I'll try. Please ping me if there is a rush/you specifically want an extra pair of eyes on something.

Thanks for your availability @jni :)

Yes, same here. I guess I got confused thinking about array axes and geographical axes for some reason. Fixed in 3aaeaca.

m-albert · 2023-10-16T12:11:21Z

Thanks for submitting this PR @Holmgren825.

The only downside is that this won't wrap around corners, ie I think the code above suffers from #320 at the corners. That seems like a super tricky fix so I'd personally be happy to punt on that to a later PR, as it seems pretty niche.

@jni I agree that it's a bit of a niche use-case, but I think the logic in dask_image.ndmeasure._label._chunk_faces implemented by #321 should already work for the wrapping case (with small modifications).

Here's what I mean: _chunk_faces determines the pairs of chunks to consider by applying the given structuring element to an aligned grid of chunks (containing linear chunk indices).
I.e. it creates a list of chunk faces by iterating over each position in this grid and pairing the current block with the block associated to each entry in the structuring element (in the forward direction).

To support wrap mode, this iteration could simply start at indices -1 instead of 0 for each axis indicated by wrap_axes.

As a comment that is independent of support for connectivity>1: probably it makes sense that all code for determining relevant chunk faces lives in the same place, i.e. currently within _chunk_faces.

m-albert

@Holmgren825 I left some comments next to the code :)

dask_image/ndmeasure/__init__.py

dask_image/ndmeasure/_utils/_label.py

m-albert · 2023-10-16T12:34:53Z

dask_image/ndmeasure/_utils/_label.py

+    faces = []
+
+    for face_slice in face_slices:
+        faces.append(labels[face_slice])
+
+    if wrap_axes is not None:
+        for ax in wrap_axes:
+            none_slice = (slice(None),) * labels.ndim
+            sl_back = set_tup_value(none_slice, ax, [-1])
+            sl_front = set_tup_value(none_slice, ax, [0])
+            faces.append(
+                da.stack([labels[sl_back], labels[sl_front]], axis=ax).squeeze()
+            )
+
+    for face in faces:


I think this logic could live inside of _chunk_faces by extending the existing implementation (tried to explain what I mean here #344 (comment)). Also, in this way all code determining the chunk faces to consider would live together.

Thanks @m-albert! I agree that it would nicer to move this to _chunk_faces, although I've struggled a bit to understand what's going on in it. One idea I had was that, in the main loop over the blocks, you could stack the bottom block on top when neigh_block[dim] >= numblocks[dim], but these slices do not wrap. So I went with a simpler approach and just moved the loop over the wrap_axes to the end of _chunk_faces, and added a slice that covers the corners of the array. This makes it pass the corner feature test case that previously failed. Lowering the connectivity to one for this case returns two features despite wrapping both axes, which I think is correct.

m-albert · 2023-10-16T12:43:10Z

tests/test_dask_image/test_ndmeasure/test_core.py

+def test_label_wrap(a, a_res, wrap_axes):
+    d = da.from_array(a, chunks=(5, 5))
+
+    s = np.ones((3, 3))


As jni commented, using structuring elements with connectivity > 1 would lead to problems in the corners. scipy.ndimage.morphology.generate_binary_structure is a nice convenience function for creating structuring elements.

m-albert · 2023-10-16T12:49:06Z

dask_image/ndmeasure/_utils/_label.py

-def label_adjacency_graph(labels, structure, nlabels, wrap=None):
+def set_tup_value(tup, idx, value):
+    """Return a copy of `tup` with `value` at `idx`."""
+    return tuple((elem if i == idx else value) for i, elem in enumerate(tup))


Hmm I'm a bit confused here. To me, indicating the axes over which dask_image.ndmeasure.label should consider the input array to wrap over is most intuitive :) (which is what I think this code does)

GenevieveBuckley · 2023-10-16T23:39:56Z

add to allowlist

- Renaming `set_tup_value` to `get_slice_tuple` - Correcting `get_slice_tuple`. - Fixing corners higher dimension arrays. - Added test cases for 3d.

Holmgren825 · 2023-10-19T10:11:13Z

After some “field” testing on real data, I've further reworked how the wrap_slices are generated (which is now takes place in _chunk_faces). Since dask arrays don't allow for nd indexing (e.g. dask/dask#4157), the wrap_slices are now tuples consisting of slices only. I added a few more tests that should cover some different 3-d use cases:

Wrapping a single dimension, corners are not connected.
Wrapping dim (1, 2), corners are connected.
Wrapping dim (1, 2), corners are connected, and first and last entry along dim 0 are connected since connectivity > 1.

Maybe these tests are a bit overkill, they add a lot of hard coded arrays, but it is tricky to test the wrapping without it.

m-albert · 2023-10-19T10:36:41Z

Hey @Holmgren825, thank you for addressing the comments above in your follow-up commits. And for adding more tests, which is super useful (also including a 3D example).

After some “field” testing on real data, I've further reworked how the wrap_slices are generated (which is now takes place in _chunk_faces). Since dask arrays don't allow for nd indexing (e.g. dask/dask#4157), the wrap_slices are now tuples consisting of slices only.

Exactly, I was wondering how to best do this since dask doesn't support fancy indexing yet. I really like your elegant solution of defining slices with a step of shape[dim]-1. Super cool :)

Thanks @m-albert! I agree that it would nicer to move this to _chunk_faces, although I've struggled a bit to understand what's going on in it. One idea I had was that, in the main loop over the blocks, you could stack the bottom block on top when neigh_block[dim] >= numblocks[dim], but these slices do not wrap. So I went with a simpler approach and just moved the loop over the wrap_axes to the end of _chunk_faces, and added a slice that covers the corners of the array. This makes it pass the corner feature test case that previously failed. Lowering the connectivity to one for this case returns two features despite wrapping both axes, which I think is correct.

I completely understand how the code in _chunk_faces is a bit tricky to decipher. I tried to comment it a bit better and merged your approach of defining slices with the grid iteration logic in _chunk_faces along the lines of what I described here. Would you agree if I pushed this to this PR branch (label_wrap) and we collaborate on it together? Alternatively, we could try to first merge this and I'd open a follow-up PR.

…ent.

Holmgren825 · 2023-10-19T11:26:29Z

I completely understand how the code in _chunk_faces is a bit tricky to decipher. I tried to comment it a bit better and merged your approach of defining slices with the grid iteration logic in _chunk_faces along the lines of what I described here. Would you agree if I pushed this to this PR branch (label_wrap) and we collaborate on it together? Alternatively, we could try to first merge this and I'd open a follow-up PR.

Great! Sounds good to me.

…faces in memory

m-albert · 2023-10-19T12:13:53Z

@Holmgren825 Cool, thank you. I pushed the changes. All the tests you had added are passing :)

Next to the discussed logic inside of _chunk_faces I joined two for loops and used yield instead of return in _chunk_faces to avoid keeping objects in memory that anyways are consumed right away.

Please feel free to raise things you don't agree with or that should be commented more clearly.

jni · 2023-10-19T12:22:21Z

Amazing! This is a great turnaround speed for this feature — which is quite unique in the Python world??? Like, I don't think you can do this with straight up ndimage??? How cool is that??? 👏

m-albert · 2023-10-19T21:11:03Z

Very exciting indeed 😁

I think further reviews are welcome at this point.

Generally speaking, I guess for most of its functionality dask-image opens up scipy.ndimage for dask arrays and tries to stick to the reference API as closely as possible (sometimes limiting its support to a subset of the API of a function in scipy.ndimage).

However here there's an additional feature introduced, which in this case might actually not only be relevant in the context of dask arrays, but also beyond as discussed here and here.

Therefore IMHO there's a tiny bit of scope creep. But the feature is useful to the community and has a straight forward implementation thanks to the chunked labelling functionality already living in dask-image. Which probably justifies just adding it.

m-albert · 2023-11-13T17:17:02Z

Gentle ping @jakirkham @GenevieveBuckley. Would you have opinions or review comments on this PR?

jni · 2023-11-14T13:06:44Z

Just writing to say I have a strong opinion that this should go in. 😂

m-albert · 2024-02-20T14:21:47Z

Time to take some action here :)

I think this PR is ready. In case there are no additional comments or objections in the next days, I'd merge this later this week.

m-albert · 2024-02-23T10:12:11Z

Merged 🎉

Thank you @Holmgren825 for your contribution to dask-image and implementing this new feature that extends functionality in scipy.ndimage!

Thanks to @jni for the groundwork, reviewing and cheerful comments!

Thanks also to @rabernat for starting the conversation on this.

jakirkham · 2024-02-23T10:45:34Z

Thanks Marvin for shepherding this in!

Also thanks Juan for reviewing!

And of course thanks Erik for reaching out and sharing this contribution with us

Hopefully it helps others with similar needs 🙂

jakirkham · 2024-02-23T10:41:33Z

dask_image/ndmeasure/_utils/_label.py

            # get neighbor slice index
-            ind_neigh_block = block_summary[tuple(neigh_block)]
+            ind_curr_block = block_summary[tuple(curr_block)]


Added a small change to this comment in PR: #353

Holmgren825 added 2 commits October 11, 2023 15:43

First simple implementation of wrap. Currently only works on 2d data …

818bbf2

…and has no axis selection.

Simplify and make it possible to select which axis to wrap over.

76f3928

Holmgren825 mentioned this pull request Oct 11, 2023

"wrap" mode for label #189

Closed

Allow n-dimensional wrapping.

3c17abd

Holmgren825 commented Oct 12, 2023

View reviewed changes

m-albert reviewed Oct 16, 2023

View reviewed changes

Holmgren825 added 3 commits October 17, 2023 16:06

Moving wrap_axes logic to _chunk_faces and adding corner slice(s).

231a076

Better comment

b57b7d8

More fixes and tests.

3aaeaca

- Renaming `set_tup_value` to `get_slice_tuple` - Correcting `get_slice_tuple`. - Fixing corners higher dimension arrays. - Added test cases for 3d.

Made face slice finding logic for wrap mode aware of structuring elem…

56ac712

…ent.

m-albert added 2 commits October 19, 2023 13:48

Joined subsequent for loops to avoid keeping faces in memory.

43efb6a

Turned _chunk_faces into a generator to avoid keeping lists of chunk …

19dac60

…faces in memory

m-albert marked this pull request as ready for review October 19, 2023 20:29

Clarified docstring

60a209e

m-albert mentioned this pull request Feb 21, 2024

readthedocs build started failing in PRs and complains about wrong sphinx version #351

Closed

m-albert merged commit 5170b9c into dask:main Feb 23, 2024
14 of 15 checks passed

m-albert mentioned this pull request Feb 23, 2024

New release #352

Closed

jakirkham reviewed Feb 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrapping labels over array boundaries #344

Wrapping labels over array boundaries #344

Holmgren825 commented Oct 11, 2023 •

edited

Loading

GPUtester commented Oct 11, 2023

jni commented Oct 12, 2023

Holmgren825 commented Oct 12, 2023

Holmgren825 Oct 12, 2023 •

edited

Loading

m-albert Oct 16, 2023

Holmgren825 Oct 17, 2023

jni Oct 18, 2023

m-albert Oct 19, 2023

Holmgren825 Oct 19, 2023

m-albert commented Oct 16, 2023 •

edited

Loading

m-albert left a comment

m-albert Oct 16, 2023

Holmgren825 Oct 17, 2023

m-albert Oct 16, 2023

m-albert Oct 16, 2023

GenevieveBuckley commented Oct 16, 2023

Holmgren825 commented Oct 19, 2023

m-albert commented Oct 19, 2023

Holmgren825 commented Oct 19, 2023

m-albert commented Oct 19, 2023

jni commented Oct 19, 2023

m-albert commented Oct 19, 2023 •

edited

Loading

m-albert commented Nov 13, 2023

jni commented Nov 14, 2023

m-albert commented Feb 20, 2024

m-albert commented Feb 23, 2024

jakirkham commented Feb 23, 2024

jakirkham Feb 23, 2024

Wrapping labels over array boundaries #344

Wrapping labels over array boundaries #344

Conversation

Holmgren825 commented Oct 11, 2023 • edited Loading

GPUtester commented Oct 11, 2023

jni commented Oct 12, 2023

Holmgren825 commented Oct 12, 2023

Holmgren825 Oct 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-albert commented Oct 16, 2023 • edited Loading

m-albert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GenevieveBuckley commented Oct 16, 2023

Holmgren825 commented Oct 19, 2023

m-albert commented Oct 19, 2023

Holmgren825 commented Oct 19, 2023

m-albert commented Oct 19, 2023

jni commented Oct 19, 2023

m-albert commented Oct 19, 2023 • edited Loading

m-albert commented Nov 13, 2023

jni commented Nov 14, 2023

m-albert commented Feb 20, 2024

m-albert commented Feb 23, 2024

jakirkham commented Feb 23, 2024

Choose a reason for hiding this comment

Holmgren825 commented Oct 11, 2023 •

edited

Loading

Holmgren825 Oct 12, 2023 •

edited

Loading

m-albert commented Oct 16, 2023 •

edited

Loading

m-albert commented Oct 19, 2023 •

edited

Loading