-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Mesh-related queue phase systems to parallelize #11804
Conversation
Does this have a significant effect on single threaded perf? |
Merge conflict. Code looks good. Could you test with a heavier single-material-type no-contention load like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming conflicts are fixed
Most modern platforms (with maybe the exception of wasm) treat all load/store operations as atomic, and so long as there is no contention on the same cache line, it's identical to single threaded performance. The only thing you might be missing out on is the compiler reordering your operations to be more optimal, which isn't a particular concern for these systems. |
# Objective Partially addresses bevyengine#3548. `queue_shadows` and `queue_material_meshes` cannot parallelize because of the `ResMut<RenderMeshInstances>` parameter for `queue_material_meshes`. ## Solution Change the `material_bind_group` field to use atomics instead of needing full mutable access. Change the `ResMut` to a `Res`, which should allow both sets of systems to parallelize without issue. ## Performance Tested against `many_foxes`, this has a significant improvement over the entire render schedule. (Yellow is this PR, red is main) ![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6) The use of atomics does seem to have a negative effect on `queue_material_meshes` (roughly a 8.29% increase in time spent in the system). ![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36) `queue_shadows` seems to be ever so slightly slower (1.6% more time spent) in the system. ![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784) `batch_and_prepare_render_phase` seems to be a mix, but overall seems to be slightly *faster* by about 5%. ![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
# Objective Partially addresses bevyengine#3548. `queue_shadows` and `queue_material_meshes` cannot parallelize because of the `ResMut<RenderMeshInstances>` parameter for `queue_material_meshes`. ## Solution Change the `material_bind_group` field to use atomics instead of needing full mutable access. Change the `ResMut` to a `Res`, which should allow both sets of systems to parallelize without issue. ## Performance Tested against `many_foxes`, this has a significant improvement over the entire render schedule. (Yellow is this PR, red is main) ![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6) The use of atomics does seem to have a negative effect on `queue_material_meshes` (roughly a 8.29% increase in time spent in the system). ![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36) `queue_shadows` seems to be ever so slightly slower (1.6% more time spent) in the system. ![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784) `batch_and_prepare_render_phase` seems to be a mix, but overall seems to be slightly *faster* by about 5%. ![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
# Objective - After #11804 , The queue_prepass_material_meshes function is now executed in parallel with other queue_* systems. This optimization introduced a potential issue where mesh_instance.should_batch() could return false in queue_prepass_material_meshes due to an unset material_bind_group_id.
Objective
Partially addresses #3548.
queue_shadows
andqueue_material_meshes
cannot parallelize because of theResMut<RenderMeshInstances>
parameter forqueue_material_meshes
.Solution
Change the
material_bind_group
field to use atomics instead of needing full mutable access. Change theResMut
to aRes
, which should allow both sets of systems to parallelize without issue.Performance
Tested against
many_foxes
, this has a significant improvement over the entire render schedule. (Yellow is this PR, red is main)The use of atomics does seem to have a negative effect on
queue_material_meshes
(roughly a 8.29% increase in time spent in the system).queue_shadows
seems to be ever so slightly slower (1.6% more time spent) in the system.batch_and_prepare_render_phase
seems to be a mix, but overall seems to be slightly faster by about 5%.