You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding is that it should be this instead for gfx9:
const uint32 vsNumSgpr = (numSgprs * 16);
Actually I'm wondering if you have to do a +1 as well before multiplying. The code does check whether vsNumSgprs is > 0 or not later on, but then uses it like so:
After some more investigation, I've come to realize that the issue is a bit more subtle than I expected.
The sgpr allocation granularity for my gfx9 card (Vega64) is 16. The register encoding of the sgpr is in multiples of 8, and 0-based. So therefore there's a need to round up to the nearest multiple of 16 after multiplying by 8. Furthermore, the +1 still seem to be missing.
In gfx9GraphicsPipeline.cpp, under the CalcMaxLateAllocLimit() function, you have the following:
My understanding is that it should be this instead for gfx9:
Actually I'm wondering if you have to do a +1 as well before multiplying. The code does check whether vsNumSgprs is > 0 or not later on, but then uses it like so:
const uint32 maxSgprVsWaves = (chipProps.gfx9.numPhysicalSgprs / vsNumSgpr) * simdPerSh;
... which, unless numPhysicalSgprs is 0-based as well, seems to suggest the +1 should be there ...
The text was updated successfully, but these errors were encountered: