Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CalcMaxLateAllocLimit() minor sgpr calculation bug #84

Open
shanminchao opened this issue Feb 1, 2022 · 1 comment
Open

CalcMaxLateAllocLimit() minor sgpr calculation bug #84

shanminchao opened this issue Feb 1, 2022 · 1 comment

Comments

@shanminchao
Copy link

shanminchao commented Feb 1, 2022

In gfx9GraphicsPipeline.cpp, under the CalcMaxLateAllocLimit() function, you have the following:

const uint32 vsNumSgpr = (numSgprs * 8);
const uint32 vsNumVgpr = (numVgprs * 4);

My understanding is that it should be this instead for gfx9:

const uint32 vsNumSgpr = (numSgprs * 16);

Actually I'm wondering if you have to do a +1 as well before multiplying. The code does check whether vsNumSgprs is > 0 or not later on, but then uses it like so:

const uint32 maxSgprVsWaves = (chipProps.gfx9.numPhysicalSgprs / vsNumSgpr) * simdPerSh;

... which, unless numPhysicalSgprs is 0-based as well, seems to suggest the +1 should be there ...

@shanminchao
Copy link
Author

After some more investigation, I've come to realize that the issue is a bit more subtle than I expected.

The sgpr allocation granularity for my gfx9 card (Vega64) is 16. The register encoding of the sgpr is in multiples of 8, and 0-based. So therefore there's a need to round up to the nearest multiple of 16 after multiplying by 8. Furthermore, the +1 still seem to be missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant