-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JIT support for control-flow guard on x64 and arm64 #63763
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAdd support for generating control-flow guard checks. The support is enabled by a JIT flag or by setting Fix #59190
|
12b37f5
to
f653dfc
Compare
Still need to fix some register allocation inefficiencies (we seem not to preference GT_PHYSREG) and VSD sequences on ARM64 can be optimized, as we end up materializing the address of the indirection cell twice. I'm also not totally happy with how the expansion works; for the dispatch helper we need to add an arg to a call in lowering, which is messy. |
Looks like there is a problem with delegate invocations. When we inline those during lowering we store lowering call (before):
N003 ( 3, 4) [000001] ------------ t1 = CNS_DBL float 10.000000000000000 $140
N004 ( 1, 1) [000000] ------------ t0 = LCL_VAR ref V00 arg0 u:1 (last use) $80
┌──▌ t1 float arg1 in mm1
├──▌ t0 ref this in rcx
N005 ( 18, 13) [000002] --CXG------- t2 = ▌ CALL float System.Func`2[Single,Single][System.Single,System.Single].Invoke $101
...
lowering call (after):
N003 ( 3, 4) [000001] ------------ t1 = CNS_DBL float 10.000000000000000 $140
┌──▌ t1 float
[000008] ------------ t8 = ▌ PUTARG_REG float REG mm1
N004 ( 1, 1) [000000] ------------ t0 = LCL_VAR ref V00 arg0 u:1 (last use) $80
┌──▌ t0 ref
[000011] DA---------- ▌ STORE_LCL_VAR ref V02 rat0 ; after placing first arg
[000012] ------------ t12 = LCL_VAR ref V02 rat0
┌──▌ t12 ref
[000013] -c---------- t13 = ▌ LEA(b+8) byref
┌──▌ t13 byref
[000014] ------------ t14 = ▌ IND ref
┌──▌ t14 ref
[000009] ------------ t9 = ▌ PUTARG_REG ref REG rcx
N001 ( 3, 2) [000015] ------------ t15 = LCL_VAR ref V02 rat0
┌──▌ t15 ref
N002 ( 4, 3) [000016] -c---------- t16 = ▌ LEA(b+24) ref
┌──▌ t16 ref
N003 ( 7, 5) [000017] -c---------- t17 = ▌ IND long REG NA
┌──▌ t8 float arg1 in mm1
├──▌ t9 ref this in rcx
├──▌ t17 long control expr
N005 ( 18, 13) [000002] --CXG------- t2 = ▌ CALL float System.Func`2[Single,Single][System.Single,System.Single].Invoke $101 This puts us in the pickle because the validation has to be called before we start placing (register) args, yet the address is computed from the stored local. Seems we will need to move this store above the first PUTARG_REG. This might reorder the evaluation of the call target with the late args, but I believe that should be fine because the late args that are put in registers cannot have globally visible side effects. |
As I mentioned in #59190 (comment), can you please make sure there is a good, detailed description of the feature and ABI in clr-abi.md, before you get a PR or merge? |
I'll make sure to do that before marking this as ready. |
LEAF_END __guard_check_icall_fptrHack, _TEXT | ||
|
||
LEAF_ENTRY __guard_dispatch_icall_fptrHack, _TEXT | ||
jmp qword ptr [__guard_dispatch_icall_fptr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__guard_dispatch_icall_fptr
is actually a pointer to the address of the helper, not the address of the helper (which kind of makes sense now that I think about it, since the helper is provided by Windows). We probably don't want to go through this jump thunk. This is my fault for not realizing it earlier. I hope this being an address of an address won't complicate things too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should already just work on the JIT side as the JIT-EE interface can return helper addresses either as direct addresses or as one that needs to be accessed through an indirection.
I assume this cell ends up as read-only when mapped? Just verifying that we do not introduce an intermediate writeable layer somewhere and end up defeating the purpose of the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should already just work on the JIT side as the JIT-EE interface can return helper addresses either as direct addresses or as one that needs to be accessed through an indirection.
Ah. Works like charm! Pushed out another commit that does that. Thanks!
I assume this cell ends up as read-only when mapped? Just verifying that we do not introduce an intermediate writeable layer somewhere and end up defeating the purpose of the feature.
Yes, IIRC the loader even validates this.
/azp run runtime jit-cfg |
No pipelines are associated with this pull request. |
/azp run runtime jit-cfg |
No pipelines are associated with this pull request. |
2125b65
to
e07386f
Compare
I've changed the implementation so that the CFG lowering now moves |
The failures are known: GC stress timeouts in the vector jobs, #63860, and an intermittent test orchestrator failure that was also seen in e.g. #61882 (comment). @dotnet/jit-contrib ping, would like to get this in soon. |
case CFGCallKind::ValidateAndCall: | ||
{ | ||
// To safely apply CFG we need to generate a very specific pattern: | ||
// in particular, it is a safety issue to allow the JIT to reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a dumb question, what does it mean when the JIT reloads a call target from memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I mean that if we place the call target in a local, the register allocator might temporarily save it to stack and then reload it from stack before the call. Instead we want to keep the call target in a register since storing the call target on stack (more generally in memory) is less safe as it could be modified due to stack overflows or from other threads.
// To sum up, we end up transforming | ||
// | ||
// ta... = <early args> | ||
// tb... = <late args> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another dumb question, what is the difference between early and late args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not a dumb question, and I'm not even sure I understand it myself. :-)
Here's what I understand: GenTreeCall
has three different fields that store nodes related to args: gtCallThisArg
, gtCallArgs
and gtCallLateArgs
. The two former fields are what I call "early args" here while the latter is the "late args".
Before morph, the latter is not used, the early args just store the operand nodes for the call. Life is good.
fgMorphArgs
, as part of morph, is responsible for setting calls up for the optimization/backend to handle, and among other things handles ABI details of how to pass the argument in either registers or on the stack.
This comes with a constraint: we cannot start placing values in registers (and for non-x86, on the stack) before we have evaluated the arguments, if the arguments contain calls. To handle these cases morph will replace the early arguments with evaluation nodes that evaluate the argument into a temp, and then the temp will be put in the late args. In some cases no evaluation is necessary (e.g. for a simple constant) and the operand will end up either in the early or late args, depending on whether it is going on the stack or in registers.
So for a call like GetThis().M(Arg1(), 10, Arg2())
we end up transforming something like
GT_CALL
gtCallThisArg: GT_CALL GetThis
gtCallArgs[0]: GT_CALL Arg1
gtCallArgs[1]: GT_CNS_INT 10
gtCallArgs[2]: GT_CALL Arg2
into
GT_CALL
gtCallThisArg: GT_ASG tmp0 (GT_CALL GetThis)
gtCallArgs[0]: GT_ASG tmp1 (GT_CALL Arg1)
gtCallArgs[1]: GT_ARGPLACE
gtCallArgs[2]: GT_ASG tmp2 (GT_CALL Arg2)
gtCallLateArgs[0]: GT_LCL_VAR tmp0
gtCallLateArgs[1]: GT_LCL_VAR tmp1
gtCallLateArgs[2]: GT_CNS_INT 10
gtCallLateArgs[3]: GT_LCL_VAR tmp2
For stack arguments we might not need any entry in gtCallLateArgs
, so there isn't a 1-1 correspondence between args and late args.
The code in lower is then responsible for creating the nodes to actually place these args into the stack/registers. Morph has pretty much done all the work at this point so this is just creating GT_PUTARG_REG
and GT_PUTARG_STK
nodes and inserting them after the corresponding args in the early/late args lists. These particular nodes is what this comment is talking about: it moves them further down than they would normally be, since it needs to insert a call very late (after evaluation of the call target that comes after the late args nodes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great. A couple minor notes. Thanks for adding the testing!
@@ -0,0 +1,52 @@ | |||
trigger: none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this pipeline is great! Especially with GCStress. Thanks for doing this.
@BruceForstall I believe I addressed your feedback, PTAL again. The failures are the same as last time except the NativeAOT one, which seems like it was also hit in #64965 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @dotnet/jit-contrib, this changes JIT GUID so I will start a SPMI collection job. |
@jakobbotsch I'm seeing build failures on some outerloop pipelines due to this PR: Can you take a look? |
@jkoritzinsky Sure, though I'm confused why this would fail to build in outerloop and not in innerloop. Do you know what could be different? |
The runtime-libraries enterprise-linux pipeline runs on a very small subset of changed paths and uses a different docker container, so it almost assuredly didn't run on this PR but could have conceivably broken due to it. |
Fix of a typo in dotnet#63763
Add support for generating control-flow guard checks. The support is enabled by a JIT flag or by setting
COMPlus_JitForceControlFlowGuard=1
.TODO:
Fix #59190