-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span<T> vs Span<String> difference in Morph: Structs/AddrExp #65815
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWondering why the hot loop codegen differs in method Example taken from: https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Collections/Indexer/IndexerSet.cs C#: With Generics
ASM: With Generics
C#: Without Generics (replace T with string)
ASM: Without Generics (replace T with string)
The hot loop's asm differs slightly, with the non-generic version skipping two loads.
With Generics:
Without Generics:
Comparing the jit dumps of both versions, the main difference I see comes from Phase Morph - Structs/AddrExp which designates the generic version's span fields (_pointer and _length) as address exposed: Phase Morph: Structs/AddrExp - With Generics
Phase Morph: Structs/AddrExp - Without Generics
It seems to me like the difference happens when the generic version encounters this runtime lookup and marks the fields as address exposed (AX) and any future usage of these fields also keeps them AX without enregistering:
Why does address exposed kick in only for the generic version? In other words, why can the non-generic version get away with not loading in the span's _pointer and _length in each iteration of the hot loop? (it keeps them enregistered for the duration of the loop and this leads it to being more performant). Thanks!
|
The address here escapes to a call, namely the constructor call, that would normally be inlined, so I wonder if this has to do with the "runtime lookup required" inlining limitation (you could check in the dump under |
Thanks, @SingleAccretion , you're right, With Generics:
Without Generics:
Once we have both fields stored in the stack after the constructor call completes, can we not enregister them in the head of the loop? (to avoid having to keep loading them in from the stack). Or maybe enregister right after the constructor call. |
To do so we must be able to reason that none of the stores in the loop store to the escaped address. We generally do not have alias analysis that is powerful enough to do so in the JIT (and in fact, I'm not sure that we could have it even for your example -- consider storing the pointer to the span inside itself. Not easily done today, but may become easier in the future). |
Perf between the shared and non-shared is the same with #99265 |
Wondering why the hot loop codegen differs in method
Span()
between the generic usage of Span<T>(_array) vs Span<string>(_array), whereT
ends up beingstring
at runtime.Example taken from: https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Collections/Indexer/IndexerSet.cs
C#: With Generics
ASM for Span(): With Generics
C#: Without Generics (replace T with string)
ASM for Span(): Without Generics (replace T with string)
The hot loop's asm differs slightly, with the non-generic version skipping two loads.
With Generics:
Without Generics:
Comparing the jit dumps of both versions, the main difference I see comes from Phase Morph - Structs/AddrExp which designates the generic version's span fields (_pointer and _length) as address exposed:
Phase Morph: Structs/AddrExp - With Generics
Phase Morph: Structs/AddrExp - Without Generics
It seems to me like the difference happens when the generic version encounters this runtime lookup and marks the fields as address exposed (AX) and any future usage of these fields also keeps them AX without enregistering:
Why does address exposed kick in only for the generic version? In other words, why can the non-generic version get away with not loading in the span's _pointer and _length in each iteration of the hot loop? (it keeps them enregistered for the duration of the loop and this leads it to being more performant).
Thanks!
category:cq
theme:morph
The text was updated successfully, but these errors were encountered: