Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing branches for Reaper #1414

Open
illwieckz opened this issue Nov 4, 2024 · 7 comments
Open

Testing branches for Reaper #1414

illwieckz opened this issue Nov 4, 2024 · 7 comments

Comments

@illwieckz
Copy link
Member

Testing: https://github.com/VReaperV/Daemon/tree/material-stages-tex

System:

GPU: AMD Radeon PRO W7600
CPU: AMD Ryzen Threadripper PRO 3955WX
resolution: 3840×2160
preset: ultra

Framerate on default spectator scenes:

  default material tex
plat23 433 354 360
metro 672 480 483
habitat 435 ☠️ 375
station12 108 221 234
@VReaperV
Copy link
Contributor

VReaperV commented Nov 4, 2024

Hmm, that's interesting as I got slightly lower fps on metro and habitat with https://github.com/VReaperV/Daemon/tree/material-stages-tex. I'm guessing it's down to a difference in how the drivers are handling the respective buffers.

@VReaperV
Copy link
Contributor

VReaperV commented Nov 4, 2024

It looks like the perceived slowdown I was getting was actually due to bugs on master, now I get same or higher fps on the branch above.

@illwieckz illwieckz changed the title Testing material-stages-tex Testing branches for Reaper Nov 8, 2024
@illwieckz
Copy link
Member Author

illwieckz commented Nov 8, 2024

Testing: https://github.com/VReaperV/Daemon/tree/test-no-multidraw

  plat23 default plat23 359 290 42 176 -7 habitat default
no-multidraw no-material 367 460 261
multidraw no-material 360 452 270
multidraw material 302 350 321

The difference betwee multidraw or not can be noise.

@illwieckz
Copy link
Member Author

The difference betwee multidraw or not can be noise.

Yes, I redone “multridraw no-material” with plat23, and now it is:

  plat23 default plat23 359 290 42 176 -7 habitat default
no multidraw no material 367 460 261
multidraw no material 369 461 270
no multidraw material 302 350 321

@VReaperV
Copy link
Contributor

VReaperV commented Nov 8, 2024

Hmm, interesting, I got slightly better performance with the test-no-multidraw branch, but maybe that was just a fluke.

It's interesting that habitat now shows better performance with material system than otherwise, compared to the first test here. Probably due to the fixes I made earlier.

@VReaperV
Copy link
Contributor

VReaperV commented Nov 8, 2024

Oh, btw @illwieckz , what result do you get on master/test-no-multidraw without r_materialSystem, and on master with r_materialSystem, while using r_profilerRenderSubGroups on? I'd be interested in a screenshot from the plat23 defaut view.

@VReaperV
Copy link
Contributor

VReaperV commented Nov 19, 2024

I've been thinking that by quantising the stage data and offloading textures to a buffer with a fixed layout might improve this further.

For context, right now each drawSurf gets its own copy of the surface data in the buffer. This means that there's a lot of data being duplicated. Additionally, it currently spans 128b and 192b for generic and lightMapping shaders, which are 2 of the most abundant ones, which means the former can only fit 0.5 or 1 in a typical cache line, while the latter will overfetch. And increases bandwidth usage for updating this data. It also makes merging surfaces into one draw command impossible (unless switching to Vulkan, or using an Nvidia extension which didn't even work in that regard on my end).

The reason each surface copies its data is because (and I tried just storing data per-stage first instead) some of the data: lightmap, deluxemap and light factor, is per-surface. The https://github.com/VReaperV/Daemon/tree/material-stages-tex branch offloads some of the data to a different buffer to workaround this issue. However, after looking at the shaders and uniforms, I believe I can fit all of the generic and lightMapping shader stage data into 8 and 20 bytes per stage respectively, while storing the textures in a different, fixed-layout buffer. The stage can then even be put into a uniform buffer, which might work a little faster. 16 bits could be used to index it, with the remaining 16 bits used to store light factor and an index to textures and lightmap/deluxemap. Light factor can even be just 1 bit since it's always either 1.0 or map light factor, which can be set as a global uniform.

Only the texture index would then prevent merging different surfaces (other than having a different material, that is), since textures can only be indexed with a dynamically uniform value. From my testing it seems this should allow merging lots of different surfaces.

The https://github.com/VReaperV/Daemon/tree/material-clusters branch was an attempt at merging surfaces by using texture arrays and binding textures per material (with a texture layer and scale used in the shader for each relevant one), which worked alright on my end (sans some bugs at surface edges), but didn't really seem to give a performance benefit. On Mesa/AMD it was slower than current material system at tested by @illwieckz, however maybe using per-stage material data would help with this. It does, however, seem that I overcomplicated that branch (it even copies vertexes, not just the indexes, and for each view), and the surface merging can probably be better achieved in another pass: the cull and surface processing shaders are already very fast, especially if the subgroup extension is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants