Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Generalize loop pre-header creation and loop hoisting (#62560)
* Generalize loop pre-header creation and loop hoisting A loop pre-header is a block that flows directly (and only) to the loop entry block. The loop pre-header is the only non-loop predecessor of the entry block. Loop invariant code can be hoisted to the loop pre-header where it is guaranteed to be executed just once (per loop entry). Currently, a loop pre-header has a number of restrictions: - it is only created for a "do-while" (top entry) loop, not for a mid-loop entry. - it isn't created if the current loop head and the loop entry block are in different EH try regions Additionally, partially due those restrictions, loop hoisting has restrictions: - it requires a "do-while" (top entry) loop - it requires the existing `head` block to dominate the loop entry block - it requires the existing `head` block to be in the same EH region as the entry block - it won't hoist if the `entry` block is the first block of a handler This change removes all these restrictions. Previously, even if we did create a pre-header, the definition of a pre-header was a little weaker: an entry predecessor could be a non-loop block and also not the pre-header, if the predecessor was dominated by the entry block. This is more complicated to reason about, so I change the pre-header creation to force entry block non-loop predecessors to branch to the pre-header instead. This case only rarely occurs, when we have what looks like an outer loop back edge but the natural loop recognition package doesn't recognize it as an outer loop. I added a "stress mode" to always create a loop pre-header immediately after loop recognition. This is disabled currently because loop cloning doesn't respect the special status and invariants of a pre-header, and so inserts all the cloning conditions and copied blocks after the pre-header, triggering new loop structure asserts. This should be improved in the future. A lot more checking of the loop table and loop annotations on blocks has been added. This revealed a number of problems with loop unrolling leaving things in a bad state for downstream phases. Loop unrolling has been updated to fix this, in particular, the loop table is rebuilt if it is detected that we unroll a loop that contains nested loops, since there will now be multiple copies of those nested loops. This is the first case where we might rebuild the loop table, so it lays the groundwork for potentially rebuilding the loop table in other cases, such as after loop cloning where we don't add the "slow path" loops to the table. There is some code refactoring to simplify the "find loops" code as well. Some change details: - `optSetBlockWeights` is elevated to a "phase" that runs prior to loop recognition. - LoopFlags is simplified: - LPFLG_DO_WHILE is removed; call `lpIsTopEntry` instead - LPFLG_ONE_EXIT is removed; check `lpExitCnt == 1` instead - LPFLG_HOISTABLE is removed (there are no restrictions anymore) - LPFLG_CONST is removed: check `lpFlags & (LPFLG_CONST_INIT | LPFLG_CONST_LIMIT) == (LPFLG_CONST_INIT | LPFLG_CONST_LIMIT)` instead (only used in one place - bool lpContainsCall is removed and replaced by LPFLG_CONTAINS_CALL - Added a `lpInitBlock` field to the loop table. For constant and variable initialization loops, code assumed that these expressions existed in the `head` block. This isn't true anymore if we insert a pre-header block. So, capture the block where these actually exist when we determine that they do exist, and explicitly use this block pointer where needed. - Added `fgComputeReturnBlocks()` to extract this code out of `fgComputeReachability` into a function - Added `optFindAndScaleGeneralLoopBlocks()` to extract this out of loop recognition to its own function. - Added `optResetLoopInfo()` to reset the loop table and block annotations related to loops - Added `fgDebugCheckBBNumIncreasing()` to allow asserting that the bbNum order of blocks is increasing. This should be used in phases that depend on this order to do bbNum comparisons. - Add a lot more loop table validation in `fgDebugCheckLoopTable()` * Inline fgBuildBlockNumMap to allow using _alloca * Fix BBJ_SWITCH output 1. Change `dspSuccs()` to not call code that will call `GetDescriptorForSwitch()` 2. Change `GetDescriptorForSwitch()` to use the correct max block number while inlining. We probably don't or shouldn't call GetDescriptorForSwitch while inlining, especially after (1), but this change doesn't hurt. * Remove incorrect assertion There was an assertion when considering an existing `head` block as a potential pre-header, that the `entry` block have the `head` block as a predecessor. However, we early exit if we find a non-head, non-loop edge. This could happen before we encounter the `head` block, making the assert incorrect. We don't want to run the entire loop just for the purpose of the assert (at least not here), so just remove the assert. * Formatting * Use `_alloca` instead of `alloca` name * Convert fgBBNumMax usage Change: ``` compIsForInlining() ? impInlineInfo->InlinerCompiler->fgBBNumMax : fgBBNumMax ``` to: ``` impInlineRoot()->fgBBNumMax ``` * Code review feedback 1. Added loop epoch concept. Currently set but never checked. 2. Added disabled assert about return blocks always being moved out of line of loop body. 3. Fixed bug checking `totalIter` after it was decremented to zero as a loop variable 4. Added more comments on incremental loop block `bbNatLoopNum` setting. * Add EH condition when converting head to pre-header When considering converting an existing loop `head` block to a pre-header, verify that the block has the same EH try region that we would create for a new pre-header. If not, we go ahead and create the new pre-header.
- Loading branch information