Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some skeleton building to cgame, batch IPC #1386

Draft
wants to merge 1 commit into
base: for-0.56.0/sync
Choose a base branch
from

Conversation

VReaperV
Copy link
Contributor

@VReaperV VReaperV commented Oct 23, 2024

Complementary pr to: Unvanquished/Unvanquished#3146.

I've profiled building-heavy scenes, and the most time was spent waiting for IPC when building the skeleton. The building of the skeletons itself can be moved completely to cgame I believe. I'm not sure if tr.animations can be moved there as well, however, since it's also used for other things (at least for culling). Additionally, I might implement GPU-driven skeleton building, which might be even faster since it would avoid uploading all the bones, doing more shader switches etc.

So for now I've copied the skeleton build functions over to cgame and added an IPC call to get a bunch of animation structs at once. There are definitely still things to clean up with that code, and there's probably a better more general solution, but this works already: this scene on plat23 with 176 eggs + om in view went from 50 to 100 FPS from this change:
unvanquished_2024-10-23_043029_000

Profiling shows that most of the time is now actually being spent building the skeleton, in some SSE function, rather than waiting for IPC.

The layout I used to test this can be found here: https://users.unvanquished.net/~reaper/maps/layouts/plat23/test.dat.

@slipher
Copy link
Member

slipher commented Oct 23, 2024

Which models use the pure-gamelogic implementation? If there are many of those it could be worth breaking out into a separate non-compat-breaking PR.

@slipher slipher changed the base branch from master to for-0.56.0/sync October 23, 2024 02:53
@VReaperV
Copy link
Contributor Author

Which models use the pure-gamelogic implementation? If there are many of those it could be worth breaking out into a separate non-compat-breaking PR.

All buildings. I'll see if I can make it not require IPC calls at all, but it would probably require duplicating tr.animations for the time being.

@VReaperV
Copy link
Contributor Author

It looks like there's also some really stupid back-and-forth with a ton of copying: getting the skeleton from engine, copying it into a refentity_t, then that entity gets sent through IPC to engine and engine copies it, copying the skeleton again... From what I can tell, cgame really only needs to know about the skeleton for adding weapons/upgrades, the rest can just be done on engine side, without sending the skeleton and copying it through IPC. Cgame can then load the relevant parts of the animations to get bone names for adding weapons and upgrades.

@slipher
Copy link
Member

slipher commented Oct 23, 2024

I tried it and got CG_BuildAnimSkeleton: Can't build skeleton (the second one) at various moments. For example each placing a buildable.

@VReaperV
Copy link
Contributor Author

Yeah, I'm assuming it's because some value isn't valid on the first frame the entity appears.

@VReaperV
Copy link
Contributor Author

This seems to be the best course of action to me right now:

  • Temporarily copy skeleton stuff to cgame, like it was done with trap_R_BlendSkeleton() (the actual engine functionality for it was apparently never cleaned up though).
  • In for-0.56, remove most of these IPC calls and refSkeleton_t from entities being sent through IPC and copied. Either create a different IPC call for, or load in cgame, the required parts of the animations.
  • Engine would then build and interpolate the skeletons based on fields in refEntity_t.

I think that should be sufficient, but I might've missed some skeleton usage that precludes this.

@VReaperV
Copy link
Contributor Author

VReaperV commented Oct 23, 2024

Temporarily copy skeleton stuff to cgame, like it was done with trap_R_BlendSkeleton() (the actual engine functionality for it was apparently never cleaned up though).

So I've tried this, but went down the rabbit hole of loading iqm to get the animations, so I'm just gonna skip that and make the proper change to for-0.56 instead (well, some/most of the functionality can probably go into master).

@illwieckz
Copy link
Member

illwieckz commented Nov 4, 2024

I tested this branch (with game counterpart) on a slow device, on plat23 map, viewpos 1920 1920 0 0 0 (renders all alien base), unfortunately that doesn't bring more performance on such scenario. I have not tested with the hundreds-of-models layout.

The hardware is Intel X3100 (GMA 965, Gen 4), on a Core 2 Duo L7500 (dual core). This GPU doesn't implement the features required for model skinning so the driver emulates it. Our own alternate code (enabled with r_vboVertexSkinning 0) is faster than driver emulation. By using this hardware that puts strong pressure on CPU, I was hoping to see a difference, because with such system, every cycles count! Unfortunately I haven't measured any difference:

  cpu cpu+smp gpu gpu+smp
old 21 21 11 11
new 21 21 11 11

So the branch will likely benefit gameplays like PVE games, but not the vanilla game.

@VReaperV
Copy link
Contributor Author

I have started a branch that moves the skeleton stuff to engine instead and skips the expensive copies. This seems to be working faster than this branch, although I still need to make it actually build the skeletons on the engine side with the values it receives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Client T-Improvement Improvement for an existing feature T-Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants