-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT-LLVM] Exception handling: dispatch #2145
[NativeAOT-LLVM] Exception handling: dispatch #2145
Conversation
aa181f9
to
29d383f
Compare
@dotnet/nativeaot-llvm Please feel free to ask any and all questions about the implementation / design. This is a fairly large and fundamental change. |
29d383f
to
36a3304
Compare
61008c5
to
36a3304
Compare
There is a silent bad codegen problem here, as evidenced by CI. Not the first one (a lot of the fixes present here that are nominally unrelated to EH also manifested themselves as silent bad codegen), but the hardest one to debug so far because it does not reproduce locally. Probably will take a while to sort it out... |
e4d809c
to
56da5dc
Compare
The tests that are removed, is that because the IL backend is no longer being used for them, and so the cases are not applicable ? |
All of the commits after and including |
Is it always in |
Yes, here's the call stack:
As one of the commits notes, prime suspect is shadow stack corruption. |
Hmm, I wonder if its the same stack corruption with the IL second pass handler. I got that better, but wasn't sure it was totally ok in https://github.com/yowl/runtimelab/tree/uo-loop-test . Nested handlers can corrupt the shadow stack on calls made in the handlers. Once the shadow stack corrupts its really hard, I feel the pain here. If it doesn't repro locally, somethings you could try to make it repro are running without memory growth and playing with the fixed memory size. |
Yes, that is the first-order suspicion... It would be helpful is understanding how does this corruption happen, i. e. where do the shadow stack pointers start to alias.
Thank you for the advice! |
From memory and a glance at the changes in that branch, if you have a stack (top down)
Then (from memory), the call to MethodA passes the wrong shadow stack - it doesn't cater for locals used in Plus the other change in that branch, is the removal a shadow stack block for every reverse pinvoke, although I dont remember if that was just a performance benefit, or there was a logical problem with the way it was. You could take the changes to the IL side and just try it I suppose. If it doesn't help then it can be discounted. Let me know if you want me to do that to this branch. |
That would be greatly appreciated :). |
in #2149. I realised while merging that there is third change here that may be more obvious. In a couple of places I added an RhpCheckedAssignRef, where previously there was just an LLVM store. If this goes anywhere we could see which is helping. Throwing from finally blocks is not that common so that idea is maybe not a strong theory. |
5c511ce
to
a5b7a05
Compare
e9263aa
to
8d9ea37
Compare
Alright, while the GC hole is still to be resolved, I believe this change by itself is (finally) ready for review. Edit: but we'll have to wait until the hole is fixed before it can be merged. |
Another big step forward! Do you know if any of the exception based tests are now compiled via the RyuJit backend, I assume some are? |
A good number of
The |
Looks impressive, do you expect https://stackoverflow.com/questions/57544142/in-c-sharp-try-finally-how-to-catch-the-original-exception/57544409#57544409 to pass with this (assuming there was no other reason why it fell back to the IL backend) ? |
No, not currently. We are still doing per-frame dispatch here because attempting to not do this will run into problem with the IL backend catching in-progress exceptions. But it is certainly the case this design should be capable of doing the whole first pass before the second pass (PR's description has a schematic for how it will work). For full inter-frame dispatch, the currently missing piece is shadow stack from the throw frame, which is expected to be carried by the native exception. The rest of the system should then "just work". |
Some minor comments. I think that the fact the tests are passing, is as good a proof (and clearly better) than me eyeballing the code that this is awesome! |
src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.wasm.cs
Outdated
Show resolved
Hide resolved
src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.wasm.cs
Outdated
Show resolved
Hide resolved
@yowl Is this good to merge? |
Yes absolutely |
@SingleAccretion Thank you! This is great work |
This change implements exception handling dispatch for the RyuJit-based LLVM backend.
All funclets will have two LLVM parameters: the shadow stack and the original shadow stack, which is shared among all of the functions belonging to the same logical parent. The shadow stack is used for calls, while the original shadow stack - to access state from the original shadow frame. Once we move to the correct pass order, the frames while dispatch is active will look something like this (note I am using the traditional "stacks grow down" notation here, while physically the shadow stack grows toward higher addresses in our implementation):
So in both passes we will have funclets located above the dispatcher. While it is conceivable to avoid this for the second pass, in the first pass it is not avoidable. The simple solution using the original shadow frame pointer solves both cases nicely. It also solves the issue of reserving space for the exception object: it is passed on the ordinary shadow stack and thus cannot clobber any state from the original shadow frame.
While the above is the high level design of the overall dispatch, the bulk of implementation is concerned with solving a narrower problem: per-frame dispatch. Here a simple approach of encoding the EH tree directly into code with one native handler (landing pad) per one managed clause was chosen, for three reasons:
Here is the comment from code which describes the specifics of how we lay out the dispatch blocks for one protected region:
Notable difference from the IL backend: since we're generating one dispatcher for one clause (note we count mutually protecting clauses as one clause), we employ dispatchers specialized for the clause type. This is expected to be a somewhat significant saving, both in code size and dispatch time, since the vast majority (95%+) of clauses are either faults/finallys or single catches. Here are the statistics I gathered using SPMI:
The prevalence of faults here, which are not expressible in C#, is due to finally cloning, which duplicates the code inside the finally so we save on the call overhead for the common path and gain additional optimization opportunities. We may choose to disable finally cloning as a size-increasing optimization in the future for WASM, but for now it is, I suppose, a good demonstration of taking advantage of existing RyuJit optimizations.
It was also chosen that we emit a fail fast in case a catch handler returns an unexpected dispatch location. This is technically unreachable code; the thinking was to leave it there while the EH implementation matures and delete once we are confident in its correctness.