Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JIT support for control-flow guard on x64 and arm64 #63763

Merged
merged 31 commits into from
Feb 8, 2022

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jan 13, 2022

Add support for generating control-flow guard checks. The support is enabled by a JIT flag or by setting COMPlus_JitForceControlFlowGuard=1.

TODO:

  • Optimize ind cell sequences that materialize constants twice
  • Optimize LSRA to take PHYSREG into account and preference the register (especially on arm64)
  • Verify that x15 is guaranteed to be preserved by the validator on ARM64
  • Check that we do not call into the validation helper in preemptive GC mode for pinvokes

Fix #59190

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 13, 2022
@ghost ghost assigned jakobbotsch Jan 13, 2022
@ghost
Copy link

ghost commented Jan 13, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Add support for generating control-flow guard checks. The support is enabled by a JIT flag or by setting COMPlus_JitForceControlFlowGuard=1.

Fix #59190

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

Still need to fix some register allocation inefficiencies (we seem not to preference GT_PHYSREG) and VSD sequences on ARM64 can be optimized, as we end up materializing the address of the indirection cell twice.

I'm also not totally happy with how the expansion works; for the dispatch helper we need to add an arg to a call in lowering, which is messy.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 14, 2022

Looks like there is a problem with delegate invocations. When we inline those during lowering we store this in a local as we will access it twice, and we create this store very late in the call sequence, after we have started placing args:

lowering call (before):
N003 (  3,  4) [000001] ------------         t1 =    CNS_DBL   float  10.000000000000000 $140
N004 (  1,  1) [000000] ------------         t0 =    LCL_VAR   ref    V00 arg0         u:1 (last use) $80
                                                  ┌──▌  t1     float  arg1 in mm1
                                                  ├──▌  t0     ref    this in rcx
N005 ( 18, 13) [000002] --CXG-------         t2 =   CALL      float  System.Func`2[Single,Single][System.Single,System.Single].Invoke $101

...

lowering call (after):
N003 (  3,  4) [000001] ------------         t1 =    CNS_DBL   float  10.000000000000000 $140
                                                  ┌──▌  t1     float  
               [000008] ------------         t8 =   PUTARG_REG float  REG mm1
N004 (  1,  1) [000000] ------------         t0 =    LCL_VAR   ref    V00 arg0         u:1 (last use) $80
                                                  ┌──▌  t0     ref    
               [000011] DA----------                STORE_LCL_VAR ref    V02 rat0    ; after placing first arg
               [000012] ------------        t12 =    LCL_VAR   ref    V02 rat0         
                                                  ┌──▌  t12    ref    
               [000013] -c----------        t13 =   LEA(b+8)  byref 
                                                  ┌──▌  t13    byref  
               [000014] ------------        t14 =   IND       ref   
                                                  ┌──▌  t14    ref    
               [000009] ------------         t9 =   PUTARG_REG ref    REG rcx
N001 (  3,  2) [000015] ------------        t15 =    LCL_VAR   ref    V02 rat0         
                                                  ┌──▌  t15    ref    
N002 (  4,  3) [000016] -c----------        t16 =   LEA(b+24) ref   
                                                  ┌──▌  t16    ref    
N003 (  7,  5) [000017] -c----------        t17 =   IND       long   REG NA
                                                  ┌──▌  t8     float  arg1 in mm1
                                                  ├──▌  t9     ref    this in rcx
                                                  ├──▌  t17    long   control expr
N005 ( 18, 13) [000002] --CXG-------         t2 =   CALL      float  System.Func`2[Single,Single][System.Single,System.Single].Invoke $101

This puts us in the pickle because the validation has to be called before we start placing (register) args, yet the address is computed from the stored local. Seems we will need to move this store above the first PUTARG_REG. This might reorder the evaluation of the call target with the late args, but I believe that should be fine because the late args that are put in registers cannot have globally visible side effects.

@BruceForstall
Copy link
Member

As I mentioned in #59190 (comment), can you please make sure there is a good, detailed description of the feature and ABI in clr-abi.md, before you get a PR or merge?

@jakobbotsch
Copy link
Member Author

As I mentioned in #59190 (comment), can you please make sure there is a good, detailed description of the feature and ABI in clr-abi.md, before you get a PR or merge?

I'll make sure to do that before marking this as ready.

LEAF_END __guard_check_icall_fptrHack, _TEXT

LEAF_ENTRY __guard_dispatch_icall_fptrHack, _TEXT
jmp qword ptr [__guard_dispatch_icall_fptr]
Copy link
Member

@MichalStrehovsky MichalStrehovsky Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__guard_dispatch_icall_fptr is actually a pointer to the address of the helper, not the address of the helper (which kind of makes sense now that I think about it, since the helper is provided by Windows). We probably don't want to go through this jump thunk. This is my fault for not realizing it earlier. I hope this being an address of an address won't complicate things too much.

Copy link
Member Author

@jakobbotsch jakobbotsch Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should already just work on the JIT side as the JIT-EE interface can return helper addresses either as direct addresses or as one that needs to be accessed through an indirection.

I assume this cell ends up as read-only when mapped? Just verifying that we do not introduce an intermediate writeable layer somewhere and end up defeating the purpose of the feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should already just work on the JIT side as the JIT-EE interface can return helper addresses either as direct addresses or as one that needs to be accessed through an indirection.

Ah. Works like charm! Pushed out another commit that does that. Thanks!

I assume this cell ends up as read-only when mapped? Just verifying that we do not introduce an intermediate writeable layer somewhere and end up defeating the purpose of the feature.

Yes, IIRC the loader even validates this.

@jakobbotsch
Copy link
Member Author

/azp run runtime jit-cfg

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@jakobbotsch
Copy link
Member Author

/azp run runtime jit-cfg

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@jakobbotsch
Copy link
Member Author

I've changed the implementation so that the CFG lowering now moves GT_PUTARG_REG nodes ahead instead of trying to move the validate call before late args. It should be much more obviously correct and avoids introducing new invariants on the IR, although the CQ is worse. For constants (invariant nodes in general) we can move them together with the GT_PUTARG_REG, and it might be worth to employ a simple interference check to move GT_LCL_VAR and GT_LCL_FLD nodes as well.

@jakobbotsch
Copy link
Member Author

The failures are known: GC stress timeouts in the vector jobs, #63860, and an intermittent test orchestrator failure that was also seen in e.g. #61882 (comment).

@dotnet/jit-contrib ping, would like to get this in soon.

case CFGCallKind::ValidateAndCall:
{
// To safely apply CFG we need to generate a very specific pattern:
// in particular, it is a safety issue to allow the JIT to reload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a dumb question, what does it mean when the JIT reloads a call target from memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I mean that if we place the call target in a local, the register allocator might temporarily save it to stack and then reload it from stack before the call. Instead we want to keep the call target in a register since storing the call target on stack (more generally in memory) is less safe as it could be modified due to stack overflows or from other threads.

// To sum up, we end up transforming
//
// ta... = <early args>
// tb... = <late args>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another dumb question, what is the difference between early and late args?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a dumb question, and I'm not even sure I understand it myself. :-)

Here's what I understand: GenTreeCall has three different fields that store nodes related to args: gtCallThisArg, gtCallArgs and gtCallLateArgs. The two former fields are what I call "early args" here while the latter is the "late args".
Before morph, the latter is not used, the early args just store the operand nodes for the call. Life is good.
fgMorphArgs, as part of morph, is responsible for setting calls up for the optimization/backend to handle, and among other things handles ABI details of how to pass the argument in either registers or on the stack.

This comes with a constraint: we cannot start placing values in registers (and for non-x86, on the stack) before we have evaluated the arguments, if the arguments contain calls. To handle these cases morph will replace the early arguments with evaluation nodes that evaluate the argument into a temp, and then the temp will be put in the late args. In some cases no evaluation is necessary (e.g. for a simple constant) and the operand will end up either in the early or late args, depending on whether it is going on the stack or in registers.

So for a call like GetThis().M(Arg1(), 10, Arg2()) we end up transforming something like

GT_CALL
  gtCallThisArg: GT_CALL GetThis
  gtCallArgs[0]: GT_CALL Arg1
  gtCallArgs[1]: GT_CNS_INT 10
  gtCallArgs[2]: GT_CALL Arg2

into

GT_CALL
  gtCallThisArg: GT_ASG tmp0 (GT_CALL GetThis)
  gtCallArgs[0]: GT_ASG tmp1 (GT_CALL Arg1)
  gtCallArgs[1]: GT_ARGPLACE
  gtCallArgs[2]: GT_ASG tmp2 (GT_CALL Arg2)
  gtCallLateArgs[0]: GT_LCL_VAR tmp0
  gtCallLateArgs[1]: GT_LCL_VAR tmp1
  gtCallLateArgs[2]: GT_CNS_INT 10
  gtCallLateArgs[3]: GT_LCL_VAR tmp2

For stack arguments we might not need any entry in gtCallLateArgs, so there isn't a 1-1 correspondence between args and late args.

The code in lower is then responsible for creating the nodes to actually place these args into the stack/registers. Morph has pretty much done all the work at this point so this is just creating GT_PUTARG_REG and GT_PUTARG_STK nodes and inserting them after the corresponding args in the early/late args lists. These particular nodes is what this comment is talking about: it moves them further down than they would normally be, since it needs to insert a call very late (after evaluation of the call target that comes after the late args nodes).

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Feb 8, 2022
@BruceForstall BruceForstall self-requested a review February 8, 2022 01:58
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great. A couple minor notes. Thanks for adding the testing!

@@ -0,0 +1,52 @@
trigger: none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this pipeline is great! Especially with GCStress. Thanks for doing this.

docs/design/coreclr/botr/clr-abi.md Outdated Show resolved Hide resolved
src/coreclr/inc/corinfo.h Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 8, 2022
@jakobbotsch
Copy link
Member Author

@BruceForstall I believe I addressed your feedback, PTAL again.

The failures are the same as last time except the NativeAOT one, which seems like it was also hit in #64965 (comment).

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakobbotsch jakobbotsch merged commit 3fc8b09 into dotnet:main Feb 8, 2022
@jakobbotsch jakobbotsch deleted the support-cfg branch February 8, 2022 17:20
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib, this changes JIT GUID so I will start a SPMI collection job.

@jkoritzinsky
Copy link
Member

jkoritzinsky commented Feb 8, 2022

@jakobbotsch
Copy link
Member Author

@jkoritzinsky Sure, though I'm confused why this would fail to build in outerloop and not in innerloop. Do you know what could be different?

@jkoritzinsky
Copy link
Member

The runtime-libraries enterprise-linux pipeline runs on a very small subset of changed paths and uses a different docker container, so it almost assuredly didn't run on this PR but could have conceivably broken due to it.

AustinWise added a commit to AustinWise/runtime that referenced this pull request Feb 9, 2022
jakobbotsch pushed a commit that referenced this pull request Feb 9, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Control flow guard support in the JIT
6 participants