Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapping labels over array boundaries #344

Merged
merged 10 commits into from
Feb 23, 2024
Merged

Conversation

Holmgren825
Copy link
Contributor

@Holmgren825 Holmgren825 commented Oct 11, 2023

I open this to get the discussion started on my hacky implementation of wrapping labels over array boundaries, as mentioned in #189. Currently, it is possible to wrap labels across one, or both, boundaries of a 2d array. This is relatively simple, since one can treat the “wrap” as another face in the label_adjacency_graph. However, there are some obvious things that need to be solved before merged:

  • How should wrapping be done in higher dimensions? I guess this could work in the same way for 3d, but nd?
  • Tests are crude. Since there is no reference implementation in SciPy, I've basically just hard coded some arrays now.

There are probably more things to figure out.

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

@jni
Copy link
Contributor

jni commented Oct 12, 2023

Hello!

Converting 2D to nD is my bread and butter. 😃

From an API perspective, rather than a string, we should do (imho), wrap_axes=None, and then you can pass in an optional tuple of int for the axes along which you want to wrap.

Then, instead of:

    if wrap in ["0", "both"]:
        faces.append(da.hstack([labels[:, [-1]], labels[:, [0]]]))

You would do:

# lil' helper function
def set_tup_value(tup, idx, value):
    """Return a copy of `tup` with `value` at `idx`."""
    return tuple((elem if i != idx else value) for i, elem in enumerate(tup))

...

for ax in wrap_axes:
    none_slice = (slice(None),) * labels.ndim
    sl_back = set_tup_value(none_slice, ax, [-1])
    sl_front = set_tup_value(none_slice, ax, [0])
    faces.append(da.stack([labels[sl_back], labels[sl_front]], axis=ax)

The only downside is that this won't wrap around corners, ie I think the code above suffers from #320 at the corners. That seems like a super tricky fix so I'd personally be happy to punt on that to a later PR, as it seems pretty niche.

As a clue to a fix, I think that if you have multiple axes and you have a structure with connectivity>1, you would need to examine the wrapped faces from earlier axes and add wrapped faces to those faces. It gets confusing. 😅 But again, I'd do this in a later PR, and perhaps add a known failing test with pytest.mark.xfail for:

dask_image.label(
        np.array([[1, 0, 0], [0, 0, 0], [0, 0, 1]]),
        structure=np.ones((3, 3)),
        wrap_axes=(0, 1),
        )

@Holmgren825
Copy link
Contributor Author

Awesome, thanks @jni! I threw a .squeeze on da.stack since this ended up with an extra dimension, which won't be compatible with the structure.

def label_adjacency_graph(labels, structure, nlabels, wrap=None):
def set_tup_value(tup, idx, value):
"""Return a copy of `tup` with `value` at `idx`."""
return tuple((elem if i == idx else value) for i, elem in enumerate(tup))
Copy link
Contributor Author

@Holmgren825 Holmgren825 Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I changed this to ==. I guess this comes down to how one thinks about wrapping. As it is now, setting wrap_axes=(0) wraps labels across the boundary of the 0th axis, whereas != would wrap the array over the 0th axis and wrap labels across the boundary of the 1st axis. Could use either one really, depends on which is more intuitive I guess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm a bit confused here. To me, indicating the axes over which dask_image.ndmeasure.label should consider the input array to wrap over is most intuitive :) (which is what I think this code does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Just wanted to comment it since it differs from the suggestion of @jni.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion from @jni was pretty abstract and untested and he well could have gotten a sign wrong somewhere 😅 Thanks @Holmgren825 and @m-albert! Sorry the next few weeks are very busy for me so I may not be able to do an in-depth review, but I'll try. Please ping me if there is a rush/you specifically want an extra pair of eyes on something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, indicating the axes over which dask_image.ndmeasure.label should consider the input array to wrap over is most intuitive :) (which is what I think this code does)

I had been wrong earlier in the sense that the code above should be saying != as jni suggested earlier. Because the idea is to only replace the element at index idx and leave other elements unchanged (those with i != idx).

Sorry the next few weeks are very busy for me so I may not be able to do an in-depth review, but I'll try. Please ping me if there is a rush/you specifically want an extra pair of eyes on something.

Thanks for your availability @jni :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same here. I guess I got confused thinking about array axes and geographical axes for some reason. Fixed in 3aaeaca.

@m-albert
Copy link
Collaborator

m-albert commented Oct 16, 2023

Thanks for submitting this PR @Holmgren825.

The only downside is that this won't wrap around corners, ie I think the code above suffers from #320 at the corners. That seems like a super tricky fix so I'd personally be happy to punt on that to a later PR, as it seems pretty niche.

@jni I agree that it's a bit of a niche use-case, but I think the logic in dask_image.ndmeasure._label._chunk_faces implemented by #321 should already work for the wrapping case (with small modifications).

Here's what I mean: _chunk_faces determines the pairs of chunks to consider by applying the given structuring element to an aligned grid of chunks (containing linear chunk indices).
I.e. it creates a list of chunk faces by iterating over each position in this grid and pairing the current block with the block associated to each entry in the structuring element (in the forward direction).

To support wrap mode, this iteration could simply start at indices -1 instead of 0 for each axis indicated by wrap_axes.

As a comment that is independent of support for connectivity>1: probably it makes sense that all code for determining relevant chunk faces lives in the same place, i.e. currently within _chunk_faces.

Copy link
Collaborator

@m-albert m-albert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Holmgren825 I left some comments next to the code :)

dask_image/ndmeasure/__init__.py Outdated Show resolved Hide resolved
dask_image/ndmeasure/_utils/_label.py Outdated Show resolved Hide resolved
Comment on lines 170 to 184
faces = []

for face_slice in face_slices:
faces.append(labels[face_slice])

if wrap_axes is not None:
for ax in wrap_axes:
none_slice = (slice(None),) * labels.ndim
sl_back = set_tup_value(none_slice, ax, [-1])
sl_front = set_tup_value(none_slice, ax, [0])
faces.append(
da.stack([labels[sl_back], labels[sl_front]], axis=ax).squeeze()
)

for face in faces:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic could live inside of _chunk_faces by extending the existing implementation (tried to explain what I mean here #344 (comment)). Also, in this way all code determining the chunk faces to consider would live together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @m-albert! I agree that it would nicer to move this to _chunk_faces, although I've struggled a bit to understand what's going on in it. One idea I had was that, in the main loop over the blocks, you could stack the bottom block on top when neigh_block[dim] >= numblocks[dim], but these slices do not wrap. So I went with a simpler approach and just moved the loop over the wrap_axes to the end of _chunk_faces, and added a slice that covers the corners of the array. This makes it pass the corner feature test case that previously failed. Lowering the connectivity to one for this case returns two features despite wrapping both axes, which I think is correct.

def test_label_wrap(a, a_res, wrap_axes):
d = da.from_array(a, chunks=(5, 5))

s = np.ones((3, 3))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As jni commented, using structuring elements with connectivity > 1 would lead to problems in the corners. scipy.ndimage.morphology.generate_binary_structure is a nice convenience function for creating structuring elements.

def label_adjacency_graph(labels, structure, nlabels, wrap=None):
def set_tup_value(tup, idx, value):
"""Return a copy of `tup` with `value` at `idx`."""
return tuple((elem if i == idx else value) for i, elem in enumerate(tup))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm a bit confused here. To me, indicating the axes over which dask_image.ndmeasure.label should consider the input array to wrap over is most intuitive :) (which is what I think this code does)

@GenevieveBuckley
Copy link
Collaborator

add to allowlist

- Renaming `set_tup_value` to `get_slice_tuple`
- Correcting `get_slice_tuple`.
- Fixing corners higher dimension arrays.
- Added test cases for 3d.
@Holmgren825
Copy link
Contributor Author

After some “field” testing on real data, I've further reworked how the wrap_slices are generated (which is now takes place in _chunk_faces). Since dask arrays don't allow for nd indexing (e.g. dask/dask#4157), the wrap_slices are now tuples consisting of slices only. I added a few more tests that should cover some different 3-d use cases:

  • Wrapping a single dimension, corners are not connected.
  • Wrapping dim (1, 2), corners are connected.
  • Wrapping dim (1, 2), corners are connected, and first and last entry along dim 0 are connected since connectivity > 1.

Maybe these tests are a bit overkill, they add a lot of hard coded arrays, but it is tricky to test the wrapping without it.

@m-albert
Copy link
Collaborator

Hey @Holmgren825, thank you for addressing the comments above in your follow-up commits. And for adding more tests, which is super useful (also including a 3D example).

After some “field” testing on real data, I've further reworked how the wrap_slices are generated (which is now takes place in _chunk_faces). Since dask arrays don't allow for nd indexing (e.g. dask/dask#4157), the wrap_slices are now tuples consisting of slices only.

Exactly, I was wondering how to best do this since dask doesn't support fancy indexing yet. I really like your elegant solution of defining slices with a step of shape[dim]-1. Super cool :)

Thanks @m-albert! I agree that it would nicer to move this to _chunk_faces, although I've struggled a bit to understand what's going on in it. One idea I had was that, in the main loop over the blocks, you could stack the bottom block on top when neigh_block[dim] >= numblocks[dim], but these slices do not wrap. So I went with a simpler approach and just moved the loop over the wrap_axes to the end of _chunk_faces, and added a slice that covers the corners of the array. This makes it pass the corner feature test case that previously failed. Lowering the connectivity to one for this case returns two features despite wrapping both axes, which I think is correct.

I completely understand how the code in _chunk_faces is a bit tricky to decipher. I tried to comment it a bit better and merged your approach of defining slices with the grid iteration logic in _chunk_faces along the lines of what I described here. Would you agree if I pushed this to this PR branch (label_wrap) and we collaborate on it together? Alternatively, we could try to first merge this and I'd open a follow-up PR.

@Holmgren825
Copy link
Contributor Author

I completely understand how the code in _chunk_faces is a bit tricky to decipher. I tried to comment it a bit better and merged your approach of defining slices with the grid iteration logic in _chunk_faces along the lines of what I described here. Would you agree if I pushed this to this PR branch (label_wrap) and we collaborate on it together? Alternatively, we could try to first merge this and I'd open a follow-up PR.

Great! Sounds good to me.

@m-albert
Copy link
Collaborator

@Holmgren825 Cool, thank you. I pushed the changes. All the tests you had added are passing :)

Next to the discussed logic inside of _chunk_faces I joined two for loops and used yield instead of return in _chunk_faces to avoid keeping objects in memory that anyways are consumed right away.

Please feel free to raise things you don't agree with or that should be commented more clearly.

@jni
Copy link
Contributor

jni commented Oct 19, 2023

Amazing! This is a great turnaround speed for this feature — which is quite unique in the Python world??? Like, I don't think you can do this with straight up ndimage??? How cool is that??? 👏

@m-albert m-albert marked this pull request as ready for review October 19, 2023 20:29
@m-albert
Copy link
Collaborator

m-albert commented Oct 19, 2023

Very exciting indeed 😁

I think further reviews are welcome at this point.

Generally speaking, I guess for most of its functionality dask-image opens up scipy.ndimage for dask arrays and tries to stick to the reference API as closely as possible (sometimes limiting its support to a subset of the API of a function in scipy.ndimage).

However here there's an additional feature introduced, which in this case might actually not only be relevant in the context of dask arrays, but also beyond as discussed here and here.

Therefore IMHO there's a tiny bit of scope creep. But the feature is useful to the community and has a straight forward implementation thanks to the chunked labelling functionality already living in dask-image. Which probably justifies just adding it.

@m-albert
Copy link
Collaborator

Gentle ping @jakirkham @GenevieveBuckley. Would you have opinions or review comments on this PR?

@jni
Copy link
Contributor

jni commented Nov 14, 2023

Just writing to say I have a strong opinion that this should go in. 😂

@m-albert
Copy link
Collaborator

Time to take some action here :)

I think this PR is ready. In case there are no additional comments or objections in the next days, I'd merge this later this week.

@m-albert m-albert merged commit 5170b9c into dask:main Feb 23, 2024
14 of 15 checks passed
@m-albert
Copy link
Collaborator

Merged 🎉

Thank you @Holmgren825 for your contribution to dask-image and implementing this new feature that extends functionality in scipy.ndimage!

Thanks to @jni for the groundwork, reviewing and cheerful comments!

Thanks also to @rabernat for starting the conversation on this.

@m-albert m-albert mentioned this pull request Feb 23, 2024
@jakirkham
Copy link
Member

Thanks Marvin for shepherding this in!

Also thanks Juan for reviewing!

And of course thanks Erik for reaching out and sharing this contribution with us

Hopefully it helps others with similar needs 🙂

Comment on lines 255 to +256
# get neighbor slice index
ind_neigh_block = block_summary[tuple(neigh_block)]
ind_curr_block = block_summary[tuple(curr_block)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small change to this comment in PR: #353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants