Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch skinned meshes on platforms where storage buffers are available. #16599

Merged
merged 15 commits into from
Dec 10, 2024

Conversation

pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Dec 2, 2024

This commit makes skinned meshes batchable on platforms other than WebGL 2. On supported platforms, it replaces the two uniform buffers used for joint matrices with a pair of storage buffers containing all matrices for all skinned meshes packed together. The indices into the buffer are stored in the mesh uniform and mesh input uniform. The GPU mesh preprocessing step copies the indices in if that step is enabled.

On the many_foxes demo, I observed a frame time decrease from 15.470ms to 11.935ms. This is the result of reducing the submit_graph_commands time from an average of 5.45ms to 0.489ms, an 11x speedup in that portion of rendering.

Screenshot 2024-12-01 192838

This is what the profile looks like for many_foxes after these changes.

Screenshot 2024-12-01 193026

This commit makes skinned meshes batchable on platforms other than WebGL
2. On such platforms, it replaces the two uniform buffers used for joint
matrices with a pair of storage buffers containing all matrices for
all skinned meshes concatenated together. The indices into the buffer
are stored in the mesh uniform and mesh input uniform. The GPU mesh
preprocessing step copies the indices in if that step is enabled.

On the `many_foxes` demo, I observed a frame time decrease from 15.470ms
to 11.935ms. This is the result of reducing the `submit_graph_commands`
time from an average of 5.45ms to 0.489ms, an 11x speedup in that
portion of rendering.
@pcwalton pcwalton requested a review from IceSentry December 2, 2024 04:57
@pcwalton pcwalton added C-Performance A change motivated by improving speed, memory usage or compile times A-Animation Make things move and change over time A-Rendering Drawing game state to the screen S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 2, 2024
@pcwalton pcwalton requested a review from kristoff3r December 4, 2024 17:28
@pcwalton pcwalton requested a review from JMS55 December 5, 2024 17:00
@JMS55
Copy link
Contributor

JMS55 commented Dec 5, 2024

I'm not sure if this is working? I ran many_foxes through renderdoc, and I still see hundreds of vkCmdDraws, each with two vkCmdBindDescriptorSets before it. One for StandardMaterial, one for skinned_mesh_bind_group. I would've expected only one vkCmdBindDescriptorSets for skinned_mesh_bind_group, and then since the meshes are the same we don't have to rebind StandardMaterial either, and then you would be able to collapse all the draws down to one draw with instance_count=N. Am I missing why this doesn't seem to reduce commands?

@pcwalton
Copy link
Contributor Author

pcwalton commented Dec 5, 2024

Hmm, it works for me on my G14 Windows 11, with many_foxes --count=100 (it's a low number because I'm in a debug build and don't want to kill my system). What platform are you on?

Screenshot 2024-12-05 134011

@JMS55
Copy link
Contributor

JMS55 commented Dec 5, 2024

Windows, Intel 155h.

@pcwalton
Copy link
Contributor Author

pcwalton commented Dec 5, 2024

It works on my M2 mac mini too. Hmm.

Copy link
Contributor

@JMS55 JMS55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine on my desktop.

Copy link
Contributor

@kristoff3r kristoff3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the skinning system, but the code looks consistent and I tested many_foxes on both Linux+webgl2 and both works.

@pcwalton pcwalton added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 6, 2024
@Elabajaba
Copy link
Contributor

Running into a panic with this on the animated_fox example.

thread 'Compute Task Pool (9)' panicked at crates/bevy_pbr/src/render/mesh.rs:927:57:
index out of bounds: the len is 20 but the index is 20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Encountered a panic in system `bevy_pbr::render::mesh::collect_meshes_for_gpu_building`!
backtrace
thread 'Compute Task Pool (9)' panicked at crates/bevy_pbr/src/render/mesh.rs:927:57:
index out of bounds: the len is 16 but the index is 21
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
   2: core::panicking::panic_bounds_check
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:276:5
   3: <usize as core::slice::index::SliceIndex<[T]>>::index
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:301:10
   4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:16:9
   5: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:3323:9
   6: bevy_pbr::render::mesh::RenderMeshInstanceGpuBuilder::update
             at ./crates/bevy_pbr/src/render/mesh.rs:927:57
   7: bevy_pbr::render::mesh::collect_meshes_for_gpu_building
             at ./crates/bevy_pbr/src/render/mesh.rs:1424:21
   8: core::ops::function::FnMut::call_mut
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:166:5
   9: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:294:13
  10: <Func as bevy_ecs::system::function_system::SystemParamFunction<fn(F0,F1,F2,F3,F4,F5) .> Out>>::run::call_inner
             at ./crates/bevy_ecs/src/system/function_system.rs:998:21
  11: <Func as bevy_ecs::system::function_system::SystemParamFunction<fn(F0,F1,F2,F3,F4,F5) .> Out>>::run
             at ./crates/bevy_ecs/src/system/function_system.rs:1001:17
  12: <bevy_ecs::system::function_system::FunctionSystem<Marker,F> as bevy_ecs::system::system::System>::run_unsafe
             at ./crates/bevy_ecs/src/system/function_system.rs:793:19
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Encountered a panic in system `bevy_pbr::render::mesh::collect_meshes_for_gpu_building`!

@bushrat011899
Copy link
Contributor

Following up from Discord, I tested this branch on an Intel i5-1240P with the following Renderdoc settings:
image
Capturing once yielded:
image
With the following statistics:
image

I believe this is sufficient evidence that the feature works on Intel iGPUs, but I'm still learning this tooling so take with a grain of salt.

@Elabajaba
Copy link
Contributor

Looks like motion vectors are also broken with this (tested by adding the MotionVectorPrepass to the camera in many_foxes).

2024-12-10T00:12:26.608557Z ERROR wgpu::backend::wgpu_core: Handling wgpu errors as fatal by default
thread 'Compute Task Pool (8)' panicked at /home/evan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/backend/wgpu_core.rs:2976:18:
wgpu error: Validation Error

Caused by:
  In RenderPass::end
    In a set_bind_group command
      BindGroup with 'skinned_motion_mesh_bind_group' label 1 expects 0 dynamic offset. However 1 dynamic offset were provided.
backtrace
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
   2: wgpu::backend::wgpu_core::default_error_handler
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/backend/wgpu_core.rs:3050:5
   3: wgpu::backend::wgpu_core::ErrorSinkRaw::handle_error
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/backend/wgpu_core.rs:3034:21
   4: wgpu::backend::wgpu_core::ContextWgpuCore::handle_error_inner
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/backend/wgpu_core.rs:299:9
   5: wgpu::backend::wgpu_core::ContextWgpuCore::handle_error
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/backend/wgpu_core.rs:311:9
   6: <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::render_pass_end
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/backend/wgpu_core.rs:2976:13
   7: <T as wgpu::context::DynContext>::render_pass_end
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/context.rs:2682:9
   8: <wgpu::api::render_pass::RenderPassInner as core::ops::drop::Drop>::drop
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-23.0.1/src/api/render_pass.rs:15:13
   9: core::ptr::drop_in_place<wgpu::api::render_pass::RenderPassInner>
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1
  10: core::ptr::drop_in_place<wgpu::api::render_pass::RenderPass>
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1
  11: core::ptr::drop_in_place<bevy_render::render_phase::draw_state::TrackedRenderPass>
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1
  12: core::mem::drop
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:943:24
  13: <bevy_core_pipeline::prepass::node::PrepassNode as bevy_render::render_graph::node::ViewNode>::run::{{closure}}
             at ./crates/bevy_core_pipeline/src/prepass/node.rs:172:13
  14: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2454:9
  16: bevy_render::renderer::RenderContext::finish::{{closure}}::{{closure}}
             at ./crates/bevy_render/src/renderer/mod.rs:511:41
  17: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:297:9
  18: <futures_lite::future::CatchUnwind<F> as core::future::future::Future>::poll::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.3.0/src/future.rs:588:42
  19: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
  20: std::panicking::try::do_call
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  21: __rust_try
  22: std::panicking::try
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  23: std::panic::catch_unwind
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  24: <futures_lite::future::CatchUnwind<F> as core::future::future::Future>::poll
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.3.0/src/future.rs:588:9
  25: async_executor::Executor::spawn_inner::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.13.0/src/lib.rs:250:20
  26: async_task::raw::RawTask<F,T,S,M>::run::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-task-4.7.1/src/raw.rs:550:21
  27: core::ops::function::FnOnce::call_once
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  28: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
  29: std::panicking::try::do_call
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  30: __rust_try
  31: std::panicking::try
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  32: std::panic::catch_unwind
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  33: async_task::raw::RawTask<F,T,S,M>::run
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-task-4.7.1/src/raw.rs:549:23
  34: async_task::runnable::Runnable<M>::run
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-task-4.7.1/src/runnable.rs:781:18
  35: async_executor::State::run::{{closure}}::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.13.0/src/lib.rs:741:21
  36: <futures_lite::future::Or<F1,F2> as core::future::future::Future>::poll
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.3.0/src/future.rs:449:33
  37: async_executor::State::run::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.13.0/src/lib.rs:748:32
  38: async_executor::Executor::run::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.13.0/src/lib.rs:344:34
  39: futures_lite::future::block_on::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.3.0/src/future.rs:99:19
  40: std::thread::local::LocalKey<T>::try_with
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12
  41: std::thread::local::LocalKey<T>::with
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9
  42: futures_lite::future::block_on
             at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.3.0/src/future.rs:78:5
  43: bevy_tasks::task_pool::TaskPool::new_internal::{{closure}}::{{closure}}::{{closure}}::{{closure}}
             at ./crates/bevy_tasks/src/task_pool.rs:171:37
  44: std::panicking::try::do_call
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  45: __rust_try
  46: std::panicking::try
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  47: std::panic::catch_unwind
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  48: bevy_tasks::task_pool::TaskPool::new_internal::{{closure}}::{{closure}}::{{closure}}
             at ./crates/bevy_tasks/src/task_pool.rs:165:43
  49: std::thread::local::LocalKey<T>::try_with
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12
  50: std::thread::local::LocalKey<T>::with
             at /home/me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9
  51: bevy_tasks::task_pool::TaskPool::new_internal::{{closure}}::{{closure}}
             at ./crates/bevy_tasks/src/task_pool.rs:158:25
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Encountered a panic in system `bevy_render::renderer::render_system`!

@pcwalton
Copy link
Contributor Author

I fixed the motion vectors issue, but was unable to reproduce the animated_fox issue on either debug or release on my Windows laptop or Mac mini.

@Elabajaba
Copy link
Contributor

I fixed the motion vectors issue, but was unable to reproduce the animated_fox issue on either debug or release on my Windows laptop or Mac mini.

I can't reproduce the animated_fox crash after the latest main was merged in 🤷

Motion vectors no longer crash but they also don't appear to be working for skinned meshes now.

Motion vector prepass texture once everything is rendered (left main, right this PR):

image

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Dec 10, 2024

This seems fine to merge, but I'm doing a manual example run here to check for regressions. Ping me once that's done / tomorrow if I forget.

@pcwalton
Copy link
Contributor Author

Ok, the skinned mesh motion vector thing should be fixed now.

Copy link
Contributor

@Elabajaba Elabajaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm that motion vectors for skinned meshes appear to be fixed!

@pcwalton
Copy link
Contributor Author

I believe the CI failure is #15981.

@pcwalton
Copy link
Contributor Author

Example run is green.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Dec 10, 2024
Merged via the queue into bevyengine:main with commit 3188e5a Dec 10, 2024
28 checks passed
BD103 pushed a commit to BD103/bevy that referenced this pull request Dec 10, 2024
bevyengine#16599)

This commit makes skinned meshes batchable on platforms other than WebGL
2. On supported platforms, it replaces the two uniform buffers used for
joint matrices with a pair of storage buffers containing all matrices
for all skinned meshes packed together. The indices into the buffer are
stored in the mesh uniform and mesh input uniform. The GPU mesh
preprocessing step copies the indices in if that step is enabled.

On the `many_foxes` demo, I observed a frame time decrease from 15.470ms
to 11.935ms. This is the result of reducing the `submit_graph_commands`
time from an average of 5.45ms to 0.489ms, an 11x speedup in that
portion of rendering.

![Screenshot 2024-12-01
192838](https://github.com/user-attachments/assets/7d2db997-8939-466e-8b9e-050d4a6a78ee)

This is what the profile looks like for `many_foxes` after these
changes.

![Screenshot 2024-12-01
193026](https://github.com/user-attachments/assets/68983fc3-01b8-41fd-835e-3d93cb65d0fa)

---------

Co-authored-by: François Mockers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Animation Make things move and change over time A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants