Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try using new_zeroed_slice to optimize f.mvs with zeroed allocations #1360

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kkysen
Copy link
Collaborator

@kkysen kkysen commented Sep 23, 2024

@ivanloz, could you see if this fixes/changes the performance issue you found in #1358?

This uses #![feature(new_zeroed_alloc)] (rust-lang/rust#129396) by the way, which will hopefully be stabilized soon after #![feature(new_uninit)] was just stabilized in 1.82

@kkysen kkysen force-pushed the kkysen/f-mvs-new_zeroed_slice branch 2 times, most recently from d6a6749 to 513891b Compare September 23, 2024 09:39
src/disjoint_mut.rs Outdated Show resolved Hide resolved
@ivanloz
Copy link

ivanloz commented Sep 24, 2024

Thanks! I'll try this out and report back.

… zeroed allocations to optimize out initialization, and use this on `f.mvs`.
@kkysen kkysen force-pushed the kkysen/f-mvs-new_zeroed_slice branch from 513891b to 7f8c74a Compare September 24, 2024 07:39
@kkysen kkysen changed the base branch from main to kkysen/Arc-from_raw-into_raw-instead-of-transmute September 24, 2024 07:39
kkysen added a commit that referenced this pull request Sep 25, 2024
……) as *const _)` (#1361)

As @Darksonn [pointed
out](#1360 (comment)),
using `mem::transmute` on an `Arc` is unsound since it's not
`#[repr(transparent)]`, and we need to go through `Arc::into_raw` and
`Arc::from_raw` with a ptr cast instead.
Base automatically changed from kkysen/Arc-from_raw-into_raw-instead-of-transmute to main September 25, 2024 16:07
@ivanloz
Copy link

ivanloz commented Sep 30, 2024

I've been investigating this and unfortunately I'm still seeing roughly the same number of page-faults -- maybe a marginal drop, but hard to tell if it's just noise. It's still an order of magnitude more page faults than seen in dav1d. Similarly, the number of context switches and overall performance seem to be largely unaffected.

I'm trying to isolate where these are now occurring. I haven't had luck narrowing it down yet, but wanted to at least provide an update here.

@kkysen
Copy link
Collaborator Author

kkysen commented Oct 1, 2024

Thanks for the update! Let me try one more small thing first (when I have some time).

@ivanloz
Copy link

ivanloz commented Oct 3, 2024

The page-faults are now originating in <rav1d::src::refmvs::save_tmvs::Fn>::call -- unfortunately I haven't been able to get more precise than that at the moment. If it's helpful, the call stack is

 <rav1d::src::refmvs::save_tmvs::Fn>::call 
 rav1d::src::decode::rav1d_decode_frame_main 
 rav1d::src::decode::rav1d_decode_frame 
 rav1d::src::decode::rav1d_submit_frame 

Since I was already collecting performance numbers, I confirmed that the number of page-faults versus frames processed in rav1d grows much faster when compared to dav1d:

100 frames 500 frames 1000 frames 2000 frames
rav1d-cli 15,731 37,831 60,071 105,349
dav1d 11,848 16,553 17,122 18,800

These numbers are noisy of course, but the trend seems consistent.

@kkysen
Copy link
Collaborator Author

kkysen commented Oct 21, 2024

Whoops, sorry I didn't see this reply/update for a while.

The page-faults are now originating in <rav1d::src::refmvs::save_tmvs::Fn>::call -- unfortunately I haven't been able to get more precise than that at the moment. If it's helpful, the call stack is

 <rav1d::src::refmvs::save_tmvs::Fn>::call 
 rav1d::src::decode::rav1d_decode_frame_main 
 rav1d::src::decode::rav1d_decode_frame 
 rav1d::src::decode::rav1d_submit_frame 

<rav1d::src::refmvs::save_tmvs::Fn>::call is one of the wrappers to an asm call, so that probably means the page faults are occurring in the asm.

Since I was already collecting performance numbers, I confirmed that the number of page-faults versus frames processed in rav1d grows much faster when compared to dav1d:

100 frames 500 frames 1000 frames 2000 frames
rav1d-cli 15,731 37,831 60,071 105,349
dav1d 11,848 16,553 17,122 18,800
These numbers are noisy of course, but the trend seems consistent.

Thanks, this is useful. We should really fix this.

@kkysen
Copy link
Collaborator Author

kkysen commented Oct 31, 2024

@ivanloz, does 04ea1ab help at all? If most of the cases are the frame size being exactly the same, this should help a lot I think. Still trying to reproduce your benchmarking, I just wasn't sure which files you were using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants