Try using `new_zeroed_slice` to optimize `f.mvs` with zeroed allocations #1360

kkysen · 2024-09-23T09:24:44Z

@ivanloz, could you see if this fixes/changes the performance issue you found in #1358?

This uses #![feature(new_zeroed_alloc)] (rust-lang/rust#129396) by the way, which will hopefully be stabilized soon after #![feature(new_uninit)] was just stabilized in 1.82

src/disjoint_mut.rs

ivanloz · 2024-09-24T02:36:13Z

Thanks! I'll try this out and report back.

… zeroed allocations to optimize out initialization, and use this on `f.mvs`.

@Darksonn

……) as *const _)` (#1361) As @Darksonn [pointed out](#1360 (comment)), using `mem::transmute` on an `Arc` is unsound since it's not `#[repr(transparent)]`, and we need to go through `Arc::into_raw` and `Arc::from_raw` with a ptr cast instead.

ivanloz · 2024-09-30T19:55:10Z

I've been investigating this and unfortunately I'm still seeing roughly the same number of page-faults -- maybe a marginal drop, but hard to tell if it's just noise. It's still an order of magnitude more page faults than seen in dav1d. Similarly, the number of context switches and overall performance seem to be largely unaffected.

I'm trying to isolate where these are now occurring. I haven't had luck narrowing it down yet, but wanted to at least provide an update here.

kkysen · 2024-10-01T00:38:42Z

Thanks for the update! Let me try one more small thing first (when I have some time).

ivanloz · 2024-10-03T14:06:09Z

The page-faults are now originating in <rav1d::src::refmvs::save_tmvs::Fn>::call -- unfortunately I haven't been able to get more precise than that at the moment. If it's helpful, the call stack is

 <rav1d::src::refmvs::save_tmvs::Fn>::call 
 rav1d::src::decode::rav1d_decode_frame_main 
 rav1d::src::decode::rav1d_decode_frame 
 rav1d::src::decode::rav1d_submit_frame

Since I was already collecting performance numbers, I confirmed that the number of page-faults versus frames processed in rav1d grows much faster when compared to dav1d:

	100 frames	500 frames	1000 frames	2000 frames
rav1d-cli	15,731	37,831	60,071	105,349
dav1d	11,848	16,553	17,122	18,800

These numbers are noisy of course, but the trend seems consistent.

kkysen · 2024-10-21T06:50:33Z

Whoops, sorry I didn't see this reply/update for a while.

The page-faults are now originating in <rav1d::src::refmvs::save_tmvs::Fn>::call -- unfortunately I haven't been able to get more precise than that at the moment. If it's helpful, the call stack is
 <rav1d::src::refmvs::save_tmvs::Fn>::call 
 rav1d::src::decode::rav1d_decode_frame_main 
 rav1d::src::decode::rav1d_decode_frame 
 rav1d::src::decode::rav1d_submit_frame 

<rav1d::src::refmvs::save_tmvs::Fn>::call is one of the wrappers to an asm call, so that probably means the page faults are occurring in the asm.

Since I was already collecting performance numbers, I confirmed that the number of page-faults versus frames processed in rav1d grows much faster when compared to dav1d:

100 frames 500 frames 1000 frames 2000 frames
rav1d-cli 15,731 37,831 60,071 105,349
dav1d 11,848 16,553 17,122 18,800
These numbers are noisy of course, but the trend seems consistent.

Thanks, this is useful. We should really fix this.

…th doesn't change.

kkysen · 2024-10-31T11:29:11Z

@ivanloz, does 04ea1ab help at all? If most of the cases are the frame size being exactly the same, this should help a lot I think. Still trying to reproduce your benchmarking, I just wasn't sure which files you were using.

kkysen added the performance label Sep 23, 2024

kkysen force-pushed the kkysen/f-mvs-new_zeroed_slice branch 2 times, most recently from d6a6749 to 513891b Compare September 23, 2024 09:39

Darksonn reviewed Sep 23, 2024

View reviewed changes

src/disjoint_mut.rs Outdated Show resolved Hide resolved

kkysen mentioned this pull request Sep 24, 2024

fn mem::transmute: Replace with sound Arc::from_raw(Arc::into_raw(…) as *const _) #1361

Merged

fn DisjointMutArcSlice::new_zeroed_slice: Add constructor that uses…

7f8c74a

… zeroed allocations to optimize out initialization, and use this on `f.mvs`.

kkysen force-pushed the kkysen/f-mvs-new_zeroed_slice branch from 513891b to 7f8c74a Compare September 24, 2024 07:39

kkysen changed the base branch from main to kkysen/Arc-from_raw-into_raw-instead-of-transmute September 24, 2024 07:39

Base automatically changed from kkysen/Arc-from_raw-into_raw-instead-of-transmute to main September 25, 2024 16:07

struct Rav1dFrameData::mvs: Avoid re-allocating f.mvs if the leng…

04ea1ab

…th doesn't change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try using `new_zeroed_slice` to optimize `f.mvs` with zeroed allocations #1360

Try using `new_zeroed_slice` to optimize `f.mvs` with zeroed allocations #1360

kkysen commented Sep 23, 2024 •

edited

Loading

ivanloz commented Sep 24, 2024

ivanloz commented Sep 30, 2024

kkysen commented Oct 1, 2024

ivanloz commented Oct 3, 2024

kkysen commented Oct 21, 2024

kkysen commented Oct 31, 2024

Try using new_zeroed_slice to optimize f.mvs with zeroed allocations #1360

Are you sure you want to change the base?

Try using new_zeroed_slice to optimize f.mvs with zeroed allocations #1360

Conversation

kkysen commented Sep 23, 2024 • edited Loading

ivanloz commented Sep 24, 2024

ivanloz commented Sep 30, 2024

kkysen commented Oct 1, 2024

ivanloz commented Oct 3, 2024

kkysen commented Oct 21, 2024

kkysen commented Oct 31, 2024

Try using `new_zeroed_slice` to optimize `f.mvs` with zeroed allocations #1360

Try using `new_zeroed_slice` to optimize `f.mvs` with zeroed allocations #1360

kkysen commented Sep 23, 2024 •

edited

Loading