Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add find_objects #96

Closed
jakirkham opened this issue Feb 3, 2019 · 8 comments · Fixed by #240
Closed

Add find_objects #96

jakirkham opened this issue Feb 3, 2019 · 8 comments · Fixed by #240

Comments

@jakirkham
Copy link
Member

Would be useful to have an implementation of find_objects for dask-image. Based on a conversation with @jni, we should be able to do this by performing find_objects with map_blocks and then resolving any large spanning objects across chunks in a subsequent step. Would also need to handle the cases where a label is missing from a chunk (i.e. find_objects returns None instead of a bounding box).

@jakirkham
Copy link
Member Author

Have done a little bit of work in PR ( #97 ) and have some more work locally. Though am struggling a bit as I'm unclear the use cases people intend to apply this on.

For instance, with the find_objects method in SciPy, it returns a list of tuples of slices. This is useful if we want to select out regions with a specific label. However with a label image in Dask we run into some issues with this. Namely we wind up returning Delayed objects instead of tuples of slices or in some cases instead of the whole list. The Delayed objects cannot really be used to slice a Dask Array.

This raises the question. What are we intending to do with the result from find_objects? Are we wanting to slice Dask Arrays, learn the maximum extent of a label, and/or something else? Once we know this a bit better, we should discuss what return value makes sense to facilitate those use cases.

@jni
Copy link
Contributor

jni commented Feb 11, 2019

@jakirkham ideally we want the dask_image implementation of regionprops to be similar to this:

https://github.com/scikit-image/scikit-image/blob/8f9d7855bad0b5c51a25d7a3c4d7ce66d27ddcb1/skimage/measure/_regionprops.py#L361-L581

ie we do indeed want to slice into a dask array. I think.

@jakirkham
Copy link
Member Author

Thanks for the pointer. So that is within the regionprops function, right? Would the goal to be use this dask_image implementation of find_objects within regionprops? Or am I missing something else you have in mind?

@jni
Copy link
Contributor

jni commented Feb 12, 2019

Yes, the goal is to use dask-image's find_objects within dask-images regionprops`.

@jni
Copy link
Contributor

jni commented Feb 12, 2019

btw, you mentioned,

the Delayed objects cannot really be used to slice a Dask Array

Is this a hard constraint? Or a not-yet-implemented constraint?

@jakirkham
Copy link
Member Author

What would it mean to slice a Dask Array by a Delayed object? What should happen to the shape of the Dask Array or its chunks? How should we differentiate between different objects usable for slicing that could be in a Delayed object (e.g. an int, a slice, a list of ints or bools, a tuple of some combination of these, etc.)?

At least for me it's difficult to think what the right answer would be for all of these cases. That doesn't mean there isn't one though.

@jni
Copy link
Contributor

jni commented Feb 12, 2019

Mmm, it's true that for a generic Delayed object, you don't know whether you're returning an array or a scalar. When you know it's slices, you know it'll be an array, but you don't know the shape. What I don't know is how bad it is to not know the shape of an output array. In this case, we at least know the dimensionality.

So, perhaps it could be the same trick as we used elsewhere: slicing with a Delayed gives a Delayed, which we can manually convert to an array. But I guess this requires rechunking to a single chunk? Which would be a massive problem.

@jakirkham
Copy link
Member Author

FWICT we don't actually need a clone of find_objects for Dask Arrays. We just need something that will help select out each label for computation with regionprops. There's already some functionality like this with labeled_comprehension currently. If we are to use scikit-image's regionprops, we would probably need to coerce the format into something it will like, but I don't think that would be very difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants