Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow callers to pass in output buffers #685

Open
tnibler opened this issue Sep 10, 2024 · 33 comments
Open

Allow callers to pass in output buffers #685

tnibler opened this issue Sep 10, 2024 · 33 comments

Comments

@tnibler
Copy link

tnibler commented Sep 10, 2024

I'm finding myself wanting to scale/blur/whatever images into an ndarray Array3. But imageproc functions all allocate their own output, which is suboptimal in many cases, for example when:

  • the user wants to get outputs into some multidimensional array
  • the same operation is performed repeatedly and output buffers could be reused instead of de- and reallocating in a loop
  • the user needs the output in some special buffer handed to them by the kernel or a device like a GPU

I'm not sure adding do_thing_into functions for every function in the library is the best way to do this and using inout parameters is cumbersome compared to the current default, so some more generic way to make operations (somewhat) agnostic to where they shove their output might be appropriate.

I'm willing to put some time into this, but given that this would break literally every API in the crate I wanted to ask for some opinions before maybe investigating how this could be done.

@ripytide
Copy link
Member

ripytide commented Sep 10, 2024

Have you seen the _mut() variants of many of the functions in the crate since they don't allocate as they take &mut Image argument? (Maybe not in the latest published version though since we've had a lot of changes since the last release)

As for processing on data from external sources I think you should be able to construct an ImageBuffer struct from any &[u8] though.

@tnibler
Copy link
Author

tnibler commented Sep 11, 2024

Those are in place though (at least the ones I found, like rotate and flip), which is different again so that makes three versions of every function: the current one returning a new image, _mut in-place and _into with the output buffer passed in. I'm not sure what the standard way of simulating overloaded functions is in Rust, but surely there is a nicer way than polluting the namespace and autocomplete with 3+ versions of every function, what do you think?

@ripytide
Copy link
Member

ripytide commented Sep 11, 2024

See #622 where I had similar thoughts.
There are a few tradeoffs involved in the current method but the new doc macro has helped somewhat in preventing duplicated doc-comments and since the non-mut variant usually calls the mut variant that also prevents code duplication.

@tnibler
Copy link
Author

tnibler commented Sep 11, 2024

What do you think about something like this?

You would only have to implement op_into (and in-place versions if possible) and get the rest generated automatically.

@ripytide
Copy link
Member

I think that could work, but some downsides I can think of might be:

  • What about methods like replace which takes two images, would that require another trait or just references into the op struct which could get messy due to the lifetime generics (would that clash with the trait declarations lack of generics?)?
  • Extra level of abstraction required in order to build the op struct in order to call the operation may be confusing or less ergonomic for one-off calls but might be more ergonomic for repeated calls, as opposed to making a closure for repeated regular function calls using the same args.
  • Docs might be worse since now rust-analyzer might show the traits doc comment rather than the op structs implementation of the op traits doc comment. Also docs.rs might be worse for this since it hides trait impl docs for structs way down the page of structs.

@ripytide
Copy link
Member

ripytide commented Sep 11, 2024

On the other hand standardizing image ops into traits does make it possible to do more advanced usage in generic scenarios such as taking any image ops as an input itself like higher kinded programming does with functions:

fn apply_op_twice<O>(op : O, image: Image) where O: ImageOp {
     Op.call(image);
     Op.call(image);
}

Maybe we could even do both methods and write a proc macro which would generate our current non-trait free-standing functions which internally then construct the op struct and then call the structs op impl. This way we get the best of both worlds and give the users the flexibility to choose which method they prefer. At the cost of increased beginner difficulty.

@tnibler
Copy link
Author

tnibler commented Sep 12, 2024

Okay, so since there is some interest I'll try and come up with some possible design ideas that can take into account some of the more complex cases and the points you mentioned.

Docs might be worse...

I think you'd just put the docs for how a filter works on the struct for it, so they appear at the very top and not in the trait docs.

but might be more ergonomic for repeated calls

More than ergonomics actually, since SomeFilter::new(...) can precompute stuff like kernels that can then be reused for mutiple operations afterwards. So depending on the application performance could be a tiny bit better (if a kernel is particularly expensive to compute and you're operating on tiny images I guess, it's probably negligible in most normal cases).
Or more realistically: an operation needs some other temporary buffer of known size, which can now be allocated once an then reused instead of reallocating a new one every iteration.

@tnibler
Copy link
Author

tnibler commented Sep 27, 2024

Ok, I have thought about some of these points and hope to flesh them out in a master's thesis, so I'll have some better insights after that hopefully.

The ndarray redesign (rust-ndarray/ndarray#1272) is also something to watch w.r.t how they handle shapes, types and internal buffer representations (alignment, padding, padding between rows to keep them aligned etc). Being able to finely control memory layouts, doing operations in place where possible, avoiding copying etc. while composing image processing pipelines would really improve upon what OpenCV has to offer here in terms of potential for optimizing locality, fusing kernels, offloading to accelerators and all that.

Curious what others here think, because I think basically reimplementing OpenCV in the age of projects like pytorch is a little bit of a wasted effort. Declarative approaches where operations are completely abstracted away from the actual representation in memory such that optimizations like loop blocking, kernel fusion can be applied over the whole pipeline are the future IMO. I definitely understand if this isn't meant to be a research project tho so excuse my ramblings in that case :D

@cospectrum
Copy link
Contributor

cospectrum commented Oct 19, 2024

into is no better or worse than mut in performance.
For historical reasons, image and imageproc use mut

@cospectrum
Copy link
Contributor

If there is no mut function, then either it is impossible to reuse the buffer, or it is simply not implemented, but will be implemented, as soon as someone has a little time to do it (its not difficult).

@tnibler
Copy link
Author

tnibler commented Oct 19, 2024

mut functions are in-place though right? That's different than putting the result into a buffer passed in by the caller while keeping the input image unchanged.

@cospectrum
Copy link
Contributor

cospectrum commented Oct 19, 2024

Reading and writing to the same memory violates the borrow checker, lets not forget that

@cospectrum
Copy link
Contributor

cospectrum commented Oct 19, 2024

And a function with the mut postfix can be either in place with 1 mutable argument or have a second immutable reference input.

@cospectrum
Copy link
Contributor

Vote for custom allocators in the next rust survey, maybe someone will finally stabilize it.

@ripytide
Copy link
Member

ripytide commented Oct 19, 2024

I'm not sure I understand your last three comments @cospectrum. Reading and writing to the same memory does not violate the borrow checker?

From what I gather this issue is about the lack of the third ownership variant on image processing functions in this library:

  1. Owned Input, Owned Output: fn (image: Image) -> Image
  2. Combined Mutable Input + Output (In-Place Operation): fn (image: &mut Image)
  3. Immutable Input, Mutable Output fn(input: &Image, output: &mut Image)
  4. Immutable Input, Owned Output: fn(image: &Image) -> Image

Interestingly though, 4. is always computationally equivalently reducible to 3.:

fn op(image: &Image) -> Image {
    let mut output = Image::new(image.width, image.height);
    op(image, &mut output);
    output
}

And similarly 1. is reducible to 2.:

fn op(mut image: Image) -> Image {
    op(&mut image);
    image
}

So we only really need to implement 2. and 3. to unlock all four variants (even generate them with macros somehow).

Our current naming conventions using blur() as an example:

  1. No Convention
  2. blur_mut()
  3. blur_into()
  4. blur().

So maybe we just need to offer variants 2. and 3. if we are interested in being flexible regarding parameter ownership.

Unfortunately, not all image processing operations only operate on one image at a time so ownership variants exponentially grow with the number of arguments, but we can probably just follow the variant 2. and 3. patterns for those too.

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

  1. "ownership variant" variant not always possible from existing implementations, for example,
    match_histogram_mut(&mut img, &img); doesn't compile and doesn't make any sense.

  2. It's not about ownership variant, I don't really care about mut vs owned convention.
    Its about memallocs: "buffers could be reused instead of de- and reallocating in a loop";
    And most of the time memallocs are spent on intermediate buffers. OpenCV hides these intermediate buffers as well. "4 signature diagram" will not help you with that, _in(allocator) will

@ripytide
Copy link
Member

Well if it's about reducing allocations by reusing buffers then variant 2. and 3. are exactly what you'd want as they let you supply your own already allocated re-usable memory buffer.

match_histogram_mut(&mut img, &img) is variant 2, if it were variant 3 then it would look like: match_histogram_into(input: &Image, output: &mut Image, target: &Image).

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

Owned variant is possible only from inplace.

@cospectrum
Copy link
Contributor

This is just an example. We already have a convention to implement _mut functions (in place or with an output buffer). So I don't see how owned brings anything to the table, despite polluting the codebase more.

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

let b = match_histogram(b, &a) looks stupid. We don't want that

@ripytide
Copy link
Member

This is just an example. We already have a convention to implement _mut functions (in place or with an output buffer). So I don't see how owned brings anything to the table, despite polluting the codebase more.

Oh I see, yeah I agree it doesn't bring anything to the table, but when I discussed this in #622 theotherphil mentioned that the owned variant (1.) is useful for ergonomic reasons.

I wasn't suggesting we should generate them with macros only that we could since they are easily reducible.

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

Lets just find functions that have only this kind of the signature f (&I) -> I, and will see if it makes sense to add _mut implementation.
They should allocate output in a simple way:
let out = Image::new(input.width(), input.height())

@ripytide
Copy link
Member

ripytide commented Oct 20, 2024

If you grep for let mut out = image.clone() there are about 15 places where we forward the implementation to the _mut() variants which is pretty sub-optimal when we could instead use _into() non-inplace variants instead.

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

Do you want to replace func(&I) -> I, func_mut(&mut I) pairs with single func(I) -> I?

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

Do you want to replace func(&I) -> I, func_mut(&I, &mut I) pairs with single func(I) -> I?

It's a question of style. Ask the owner.

@ripytide
Copy link
Member

ripytide commented Oct 20, 2024

No I think we should replace func(&I) -> I with func(&I, &mut I) which is a matter of performance not style since non-in-place functions can be much faster than in-place versions and so there is no need to use the slower version when we could use the faster version.

I also think we should keep func_mut(&mut I) as it is.

@ripytide
Copy link
Member

ripytide commented Oct 20, 2024

I also think we should get rid of shallow functions like this:

pub fn open(image: &GrayImage, norm: Norm, k: u8) -> GrayImage {
    let mut out = image.clone();
    open_mut(&mut out, norm, k);
    out
}

Since they are potentially misleading to users since they could think that it is implemented using a non-inplace algorithm when in reality it is just forwarding it's implementation to the inplace implementation.

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

Should we remove match_histogram then?
There is match_histogram_mut

@cospectrum
Copy link
Contributor

We should follow image codestyle in my opinion

@cospectrum
Copy link
Contributor

cospectrum commented Oct 20, 2024

We should follow image codestyle in my opinion

They have imageops::blur(&img, sigma), but no imageops::blur_mut(&mut out, &img, sigma). So the same questions araise for them. imageproc is some kind of extension

@ripytide
Copy link
Member

image seems to prioritize non-breaking changes over most other things so I'm dubious to follow any of their conventions blindly. Although they do have some _in() variants like flip_vertical_in().

@cospectrum
Copy link
Contributor

Introducing new function is not breaking change.

@cospectrum
Copy link
Contributor

I would save _in for allocators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants