-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed allocation from filter_pixel()
#656
Conversation
Still 4 times slower for me |
Can you push a branch with the code you are running to get that result so I can investigate your use-case? |
Okay now that #654 was merged this comparison becomes simpler. |
The same bench from #654 (comment) using your branch |
@cospectrum can you share your implementation of |
It's not "my" implementation. It's from 0.25 |
Okay I think I've tracked down the discrepancy between the I get the same results as you by default. But if you add the same build config as Compile Settings[profile.release]
opt-level = 3
debug = true
[profile.bench]
opt-level = 3
debug = true
rpath = false
lto = false
debug-assertions = false
codegen-units = 1 Benchmarked Code#![feature(test)]
extern crate test;
use std::hint::black_box;
use test::Bencher;
const SIZE: u32 = 300;
fn main() {}
#[bench]
fn bench_025(b: &mut Bencher) {
let image = black_box(imageproc_025::image::RgbImage::new(SIZE, SIZE));
let kernel: Vec<i32> = (0..3 * 3).collect();
b.iter(|| {
let filtered = imageproc_025::filter::filter3x3::<_, _, i16>(&image, &kernel);
black_box(filtered);
});
}
#[bench]
fn bench_master(b: &mut Bencher) {
let image = black_box(imageproc_master::image::RgbImage::new(SIZE, SIZE));
let kernel: Vec<i32> = (0..3 * 3).collect();
let kernel = imageproc_master::kernel::Kernel::new(&kernel, 3, 3);
b.iter(|| {
let filtered = imageproc_master::filter::filter_clamped::<_, _, i16>(&image, kernel);
black_box(filtered);
});
} ResultsMaster No Settings
PR No Settings
Master With Settings
PR With Settings
|
Now I suppose I just need to track down which specific build setting is making such a big difference, and why in the |
When using |
If I |
I'll try this PR vs the current master on my MacBook when I have time within the next few days. |
@cospectrum what machine are you benchmarking on? I appear to be seeing much larger regressions on an Apple ARM chip than @ripytide is on their Ryzen x86. |
M1 |
I think the setting which changes which implementation is faster (on my machine) is Here I've benchmarked the default compile setting: [profile.release]
codegen-units = 16 vs [profile.release]
codegen-units = 1 Results
|
I’ve not had much time in the last couple of weeks but I still intend to benchmark this branch against master on my MacBook. |
Results on master:
Results for this PR:
This still has large regressions, at least on an Apple ARM chip, so I'll close this PR. I'll need to look at filter performance a bit anyway before the next release of this crate, to resolve #664. |
Achieved via always computing with 4 channels using an array of length 4 since const generic expressions are unstable.