-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SSA def descriptors in copy propagation #65250
Use SSA def descriptors in copy propagation #65250
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThe next round of copy propagation improvements: push SSA defs on the stacks instead of nodes and optimize the main loop. The main benefit is performance: my PIN tells me this change reduces the number of instruction retired by Additionally, the way this change is structured will allow me to easily replicate previous behavior once the LHS locals no longer have meaningful VNs. There some small diffs with this change, they're caused by the slightly different rebalancing order of the hash table as we no longer "overwrite" the stack pointer when pushing a def and because we don't push defs for shadowed locals.
|
f098e85
to
31d148f
Compare
Linux musl x64 failure looks like #59542. Windows ARM64 timeout does not seem related (change affects all targets equally). @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
Suggested a few extra asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
else | ||
{ | ||
assert((lclNode->gtFlags & GTF_VAR_DEF) != 0); | ||
// This will be "RESERVED_SSA_NUM" for promoted struct fields assigned using the parent struct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to add an assert for this? Also, what about the TODO-CQ
? Is that something that is planned to address in subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will assert, seems like that would catch unexpected cases.
I am personally not planning on fixing it (well, #61049 would fix it, but it requires more design work), this is just a general TODO-CQ
to make the reader aware of this inefficiency (the fact that it is not "by design").
d5c0db9
to
a552f20
Compare
I was looking at the changes closely prior to merging and didn't quite follow what part of (which) loop is getting optimized by using defs instead of nodes. Could you please elaborate? |
@kunalspathak We are moving work from the propagation loop to the "push" one: previously, we were fetching the SSA number and the VN from the node in the propagation loop (which is not that cheap for partial definitions especially), and checking it then. Now we effectively check it only once (and push "unavailable" defs in case the checks failed that then short-circuit the propagation loop). This also reduces the number of indirections once we start pushing defs based not the LHS's VN but the SSA def itself (my next change - has some good CQ benefits too) from Notably, this is not all that could have been done: we're still traversing the IR two times (instead of just one like SSA does), and we could push a |
Thanks for the explanation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This reverts commit 9012b23.
The next round of copy propagation improvements: push SSA defs on the stacks instead of nodes and optimize the main loop.
The main benefit is performance: my PIN tells me this change reduces the number of instruction retired by
~0.40%
when replayingbenchmarks.run
.Additionally, the way this change is structured will allows easily replicating previous behavior once the LHS locals no longer have meaningful VNs.
There some small diffs with this change, they're caused by the slightly different rebalancing order of the hash table as we no longer "overwrite" the stack pointer when pushing a def and don't push defs for shadowed locals.