Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control flow graph to structured control flow #75

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggaowp-zz
Copy link

@ggaowp-zz ggaowp-zz commented Feb 22, 2022

Motivation

See title, before this PR, the YUL codes follow a state-machine style, which can run on EVM while is not efficient, this PR tries to implement algorithm in this article
https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2
to convert CFG to structured control flow.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

UPBL=1 cargo test
make sure that the generated YUL codes compile

!! Succeeded compiling Yul

@ggaowp-zz ggaowp-zz force-pushed the cfg_to_scf branch 2 times, most recently from 5cc0f6f to 3365a13 Compare February 24, 2022 17:04
@ggaowp-zz ggaowp-zz force-pushed the cfg_to_scf branch 10 times, most recently from 7b2d564 to 1060513 Compare March 2, 2022 19:41
language/evm/move-to-yul/src/experiments.rs Show resolved Hide resolved
language/evm/move-to-yul/src/functions.rs Show resolved Hide resolved
{
ctx.emit_block(|| {
for offs in *lower..*upper + 1 {
self.bytecode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make a helper for generating code from lower...upper.

use crate::stackless_control_flow_graph::StacklessControlFlowGraph;

pub struct StacklessStructuredControlFlow {
pub top_sort: StackifierTopologicalSort,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this representation of the CFG is not sufficient. You somehow need to encode conditionals. Because you aren't doing this, also the generated output doesn't deal correctly with if-then-else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrwg Thanks for the review, after the last commit, as shown in "ControlStructures.exp.apply-cfg-to-scf", the YUL codes are still using the switch, and the YUL errors are resolved now.

!! Succeeded compiling Yul

I might need some help to emit do-while and if-else instead, can we have a quick chat when you are free, maybe tomorrow?

@ggaowp-zz ggaowp-zz force-pushed the cfg_to_scf branch 4 times, most recently from a80f9d5 to 2438b56 Compare March 3, 2022 02:08
// if ($t5) goto L4 else goto L3
switch $t5
case 0 { $block4 := 4 }
default { $block4 := 5 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block does not exist in this switch.

$t3 := $Gt(x, $t2)
// if ($t3) goto L0 else goto L2
switch $t3
case 0 { $block := 3 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general pattern to reduce this to an if-then-else is as follows. Given (after topsort):

L1:
    B1
    if c goto L2 else L3
L2:
    B2
    goto L4
L3:
    B3        
L4:

... becomes:

B1
switch c
case 0 { B3 }
default { B2 }

So one would need to identify these patterns n the code.

IMO one would best do this by introducing a temporary SCG data structures which represents if-then-else and possibly also loops.

Copy link
Author

@ggaowp-zz ggaowp-zz Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrwg Thanks for the comment! I was a bit stuck and thus checked out how Cheer implemented it, I found that it had more granular "tokens", or "bytecodes" in our repo iiuc,
https://github.com/leaningtech/cheerp-compiler/blob/30a05221c34f418374dd8b8149d63f0917477d34/llvm/lib/CheerpWriter/CheerpWriter.cpp#L5009-L5028

so I am wondering do we need a separate set of bytecodes other than those in "stackless_bytecodes.rs", so that we can generate if-else and while based on the more granular ones instead?

a temporary SCG data structures which represents if-then-else and possibly also loops.

can you elaborate a bit how SCG here would help detect the pattern?

Thanks a lot!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrwg Thanks for the comment! I was a bit stuck and thus checked out how Cheer implemented it, I found that it had more granular "tokens", or "bytecodes" in our repo iiuc, https://github.com/leaningtech/cheerp-compiler/blob/30a05221c34f418374dd8b8149d63f0917477d34/llvm/lib/CheerpWriter/CheerpWriter.cpp#L5009-L5028

so I am wondering do we need a separate set of bytecodes other than those in "stackless_bytecodes.rs", so that we can generate if-else and while based on the more granular ones instead?

I'm not sure what you mean with finer granularity. We have a rather regular representation of bytecode and we certainly won't change any of this. The code you are pointing to seems unrelated to the problem at hand, it is just a compiler from some IR to a text form.

a temporary SCG data structures which represents if-then-else and possibly also loops.

can you elaborate a bit how SCG here would help detect the pattern?

I'm not talking about how to detect the pattern, but how to represent the result. However, thinking about the result may guide you also how to do this:

enum SCG {
   Block{ start_offs: usize, end_offs: usize } // slice of code without any branches
   If{ cond: TempIndex, if_true: Box<SCG>, if_false: Box<SCG> }
   Loop{ body: SCG } // perhaps need a new opcode for continue and break

Thanks a lot!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrwg thanks for the explanation!

The code you are pointing to seems unrelated to the problem at hand, it is just a compiler from some IR to a text form.

I was thinking of generating text forms with blocks, now the text forms are generated based on the related bytecode, more specifically "branch" for control flows. Contrary to this, in the code pointer, it has a notion of "token", which has different tokens for "if", "while" etc. With the "branch" we can directly generate the switch style of text codes while to generate if-else, I guessed we would need some similar notion of "tokens" there?

how to represent the result

I see, this makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrwg thanks for the explanation!

The code you are pointing to seems unrelated to the problem at hand, it is just a compiler from some IR to a text form.

I was thinking of generating text forms with blocks, now the text forms are generated based on the related bytecode, more specifically "branch" for control flows.

That is what I mean with this SCG data structure. You generate this one, then change the code in generator.rs to generate Yul from the SCG.

Contrary to this, in the code pointer, it has a notion of "token", which has different tokens for "if", "while" etc. With the "branch" we can directly generate the switch style of text codes while to generate if-else, I guessed we would need some similar notion of "tokens" there?

how to represent the result

I see, this makes sense!

@ggaowp-zz ggaowp-zz force-pushed the cfg_to_scf branch 3 times, most recently from 8ab0700 to fd1b741 Compare March 9, 2022 17:26
@ggaowp-zz
Copy link
Author

the if-else logic is now emitted, per Wolfgang's suggestion, will leave the handling of loops with new bytecodes for now.

if let StacklessSCG::BasicBlock {
start_offset,
end_offset,
} = **if_false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is a nested IfBlock?

} = **if_false
{
ctx.emit_block(|| {
for offs in start_offset..end_offset + 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something not right here. You are already emitting code for a BasicBlock on line 242. Why do you need to do it again here? And as said, what happens if if_true and if_false are not basic blocks? This should be a recursive algorithm here which handles the variants of StacklessSCG exactly once.

}
}

fn emit_scf_from_cfg(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, the construction of the SCG really doesn't belong in this module. It should fully go into the module which defines this data type.

let mut start = *lower;
for offs in *lower..*upper + 1 {
match &code[offs as usize] {
Bytecode::Branch(_, if_t, if_f, cond) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch is always at the end of the block -- this is the standard definition of basic blocks.

That's why we also do not actually need break and continue bytecodes. The SCG data structure simply could have a marker at the end of each block whether it ends with a continue or break.

Copy link
Author

@ggaowp-zz ggaowp-zz Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry that I am not following here, do you mean that here I should try to match other bytecodes like Assign rather than Branch?

a marker at the end of each block whether it ends with a continue or break

is this the same idea as we chatted last time, to computing a map from label to "block position" and determine if we put a continue or break there?

Thanks for the patience!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry that I am not following here, do you mean that here I should try to match other bytecodes like Assign rather than Branch?

I mean that you don't need to loop from lower to upper, you just look at the last instruction in the block.

a marker at the end of each block whether it ends with a continue or break

is this the same idea as we chatted last time, to computing a map from label to "block position" and determine if we put a continue or break there?

I mean we do not need new instructions, We can rather represent break and continue like in SCG::Block(start, end, block_kind) where eum BlockKind { Fallthrough, Break, Continue} or something like this.

Thanks for the patience!

get_block(if_f),
get_block(if_t),
)
if !skip_block {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a branch is always at the end of a basic block, your algorithm can simply construct SCG::BasicBlock(start..end) which does not contain the branch anymore (let it 'end" just before the branch).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make sure that I followed, "blocks" were assigned on "Branch" and "Jump", so to do it right, I want to only skip it on "Jump"? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each block either ends with a branch, jump, or return. Those instructions cannot appear anywhere else than at the end of a block.

if_true: Box::new(StacklessSCG::new(*if_block_id, &cfg)),
if_false: Box::new(StacklessSCG::new(*else_block_id, &cfg)),
};
visited_blocks.insert(*if_block_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this is correct. You have not fully processed if/else blocks, which can contain themselves ifs and loops. Marking them as done seems to ignore them for further processing, AFAICT. This algorithm, I guess, needs to be somehow recursive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the insertion is to avoid pushing duplicates to scg_vec.
Re nested if-else, I followed your comment and counted on the processing of the scg_vec to recursively emit the inner if-else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you have pushed this, will it be not ignored for processing on line 198?

@ggaowp-zz
Copy link
Author

Hi @wrwg I followed the idea and was able to generate "block_kind" for each block depending on its trailing goto label, but looking at the emitted codes, it's still not about right.

One issue I saw was, now the loop_header is still a basic, which is not able to handle either ifBlock or forBlock, the same thing for the if_t and if_f of IfBlock, I will need to think of ways to recursively generate the inner Block there.

Publishing the latest version now and will try to fix it, any wisdom is more than welcome.

villesundell pushed a commit to villesundell/move-old that referenced this pull request Apr 26, 2022
sahithiacn pushed a commit to sahithiacn/move that referenced this pull request Jan 2, 2023
bors-diem pushed a commit that referenced this pull request Jan 6, 2023
bors-diem pushed a commit that referenced this pull request Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants