-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control flow graph to structured control flow #75
base: main
Are you sure you want to change the base?
Conversation
5cc0f6f
to
3365a13
Compare
7b2d564
to
1060513
Compare
language/evm/move-to-yul/tests/ControlStructures.exp.apply-cfg-to-scf
Outdated
Show resolved
Hide resolved
language/evm/move-to-yul/tests/ControlStructures.exp.apply-cfg-to-scf
Outdated
Show resolved
Hide resolved
{ | ||
ctx.emit_block(|| { | ||
for offs in *lower..*upper + 1 { | ||
self.bytecode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should make a helper for generating code from lower...upper.
language/move-prover/bytecode/src/stackifier_topological_sort.rs
Outdated
Show resolved
Hide resolved
language/move-prover/bytecode/src/stackifier_topological_sort.rs
Outdated
Show resolved
Hide resolved
language/move-prover/bytecode/src/stackless_control_flow_graph.rs
Outdated
Show resolved
Hide resolved
use crate::stackless_control_flow_graph::StacklessControlFlowGraph; | ||
|
||
pub struct StacklessStructuredControlFlow { | ||
pub top_sort: StackifierTopologicalSort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this representation of the CFG is not sufficient. You somehow need to encode conditionals. Because you aren't doing this, also the generated output doesn't deal correctly with if-then-else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrwg Thanks for the review, after the last commit, as shown in "ControlStructures.exp.apply-cfg-to-scf", the YUL codes are still using the switch
, and the YUL errors are resolved now.
!! Succeeded compiling Yul
I might need some help to emit do-while
and if-else
instead, can we have a quick chat when you are free, maybe tomorrow?
a80f9d5
to
2438b56
Compare
// if ($t5) goto L4 else goto L3 | ||
switch $t5 | ||
case 0 { $block4 := 4 } | ||
default { $block4 := 5 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block does not exist in this switch.
$t3 := $Gt(x, $t2) | ||
// if ($t3) goto L0 else goto L2 | ||
switch $t3 | ||
case 0 { $block := 3 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general pattern to reduce this to an if-then-else is as follows. Given (after topsort):
L1:
B1
if c goto L2 else L3
L2:
B2
goto L4
L3:
B3
L4:
... becomes:
B1
switch c
case 0 { B3 }
default { B2 }
So one would need to identify these patterns n the code.
IMO one would best do this by introducing a temporary SCG data structures which represents if-then-else and possibly also loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrwg Thanks for the comment! I was a bit stuck and thus checked out how Cheer implemented it, I found that it had more granular "tokens", or "bytecodes" in our repo iiuc,
https://github.com/leaningtech/cheerp-compiler/blob/30a05221c34f418374dd8b8149d63f0917477d34/llvm/lib/CheerpWriter/CheerpWriter.cpp#L5009-L5028
so I am wondering do we need a separate set of bytecodes other than those in "stackless_bytecodes.rs", so that we can generate if-else and while based on the more granular ones instead?
a temporary SCG data structures which represents if-then-else and possibly also loops.
can you elaborate a bit how SCG here would help detect the pattern?
Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrwg Thanks for the comment! I was a bit stuck and thus checked out how Cheer implemented it, I found that it had more granular "tokens", or "bytecodes" in our repo iiuc, https://github.com/leaningtech/cheerp-compiler/blob/30a05221c34f418374dd8b8149d63f0917477d34/llvm/lib/CheerpWriter/CheerpWriter.cpp#L5009-L5028
so I am wondering do we need a separate set of bytecodes other than those in "stackless_bytecodes.rs", so that we can generate if-else and while based on the more granular ones instead?
I'm not sure what you mean with finer granularity. We have a rather regular representation of bytecode and we certainly won't change any of this. The code you are pointing to seems unrelated to the problem at hand, it is just a compiler from some IR to a text form.
a temporary SCG data structures which represents if-then-else and possibly also loops.
can you elaborate a bit how SCG here would help detect the pattern?
I'm not talking about how to detect the pattern, but how to represent the result. However, thinking about the result may guide you also how to do this:
enum SCG {
Block{ start_offs: usize, end_offs: usize } // slice of code without any branches
If{ cond: TempIndex, if_true: Box<SCG>, if_false: Box<SCG> }
Loop{ body: SCG } // perhaps need a new opcode for continue and break
Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrwg thanks for the explanation!
The code you are pointing to seems unrelated to the problem at hand, it is just a compiler from some IR to a text form.
I was thinking of generating text forms with blocks, now the text forms are generated based on the related bytecode, more specifically "branch" for control flows. Contrary to this, in the code pointer, it has a notion of "token", which has different tokens for "if", "while" etc. With the "branch" we can directly generate the switch style of text codes while to generate if-else, I guessed we would need some similar notion of "tokens" there?
how to represent the result
I see, this makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrwg thanks for the explanation!
The code you are pointing to seems unrelated to the problem at hand, it is just a compiler from some IR to a text form.
I was thinking of generating text forms with blocks, now the text forms are generated based on the related bytecode, more specifically "branch" for control flows.
That is what I mean with this SCG data structure. You generate this one, then change the code in generator.rs to generate Yul from the SCG.
Contrary to this, in the code pointer, it has a notion of "token", which has different tokens for "if", "while" etc. With the "branch" we can directly generate the switch style of text codes while to generate if-else, I guessed we would need some similar notion of "tokens" there?
how to represent the result
I see, this makes sense!
8ab0700
to
fd1b741
Compare
the if-else logic is now emitted, per Wolfgang's suggestion, will leave the handling of loops with new bytecodes for now. |
if let StacklessSCG::BasicBlock { | ||
start_offset, | ||
end_offset, | ||
} = **if_false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this is a nested IfBlock
?
} = **if_false | ||
{ | ||
ctx.emit_block(|| { | ||
for offs in start_offset..end_offset + 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something not right here. You are already emitting code for a BasicBlock on line 242. Why do you need to do it again here? And as said, what happens if if_true
and if_false
are not basic blocks? This should be a recursive algorithm here which handles the variants of StacklessSCG exactly once.
} | ||
} | ||
|
||
fn emit_scf_from_cfg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, the construction of the SCG really doesn't belong in this module. It should fully go into the module which defines this data type.
let mut start = *lower; | ||
for offs in *lower..*upper + 1 { | ||
match &code[offs as usize] { | ||
Bytecode::Branch(_, if_t, if_f, cond) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch is always at the end of the block -- this is the standard definition of basic blocks.
That's why we also do not actually need break and continue bytecodes. The SCG data structure simply could have a marker at the end of each block whether it ends with a continue or break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry that I am not following here, do you mean that here I should try to match other bytecodes like Assign rather than Branch?
a marker at the end of each block whether it ends with a continue or break
is this the same idea as we chatted last time, to computing a map from label to "block position" and determine if we put a continue or break there?
Thanks for the patience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry that I am not following here, do you mean that here I should try to match other bytecodes like Assign rather than Branch?
I mean that you don't need to loop from lower to upper, you just look at the last instruction in the block.
a marker at the end of each block whether it ends with a continue or break
is this the same idea as we chatted last time, to computing a map from label to "block position" and determine if we put a continue or break there?
I mean we do not need new instructions, We can rather represent break and continue like in SCG::Block(start, end, block_kind)
where eum BlockKind { Fallthrough, Break, Continue}
or something like this.
Thanks for the patience!
get_block(if_f), | ||
get_block(if_t), | ||
) | ||
if !skip_block { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a branch is always at the end of a basic block, your algorithm can simply construct SCG::BasicBlock(start..end) which does not contain the branch anymore (let it 'end" just before the branch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make sure that I followed, "blocks" were assigned on "Branch" and "Jump", so to do it right, I want to only skip it on "Jump"? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each block either ends with a branch, jump, or return. Those instructions cannot appear anywhere else than at the end of a block.
if_true: Box::new(StacklessSCG::new(*if_block_id, &cfg)), | ||
if_false: Box::new(StacklessSCG::new(*else_block_id, &cfg)), | ||
}; | ||
visited_blocks.insert(*if_block_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this is correct. You have not fully processed if/else blocks, which can contain themselves ifs and loops. Marking them as done seems to ignore them for further processing, AFAICT. This algorithm, I guess, needs to be somehow recursive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the insertion is to avoid pushing duplicates to scg_vec.
Re nested if-else, I followed your comment and counted on the processing of the scg_vec to recursively emit the inner if-else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you have pushed this, will it be not ignored for processing on line 198?
Hi @wrwg I followed the idea and was able to generate "block_kind" for each block depending on its trailing goto label, but looking at the emitted codes, it's still not about right. One issue I saw was, now the loop_header is still a basic, which is not able to handle either ifBlock or forBlock, the same thing for the if_t and if_f of IfBlock, I will need to think of ways to recursively generate the inner Block there. Publishing the latest version now and will try to fix it, any wisdom is more than welcome. |
Signed-off-by: sahithiacn <[email protected]>
Signed-off-by: sahithiacn <[email protected]> Closes: #314
Signed-off-by: sahithiacn <[email protected]> Closes: #314
Motivation
See title, before this PR, the YUL codes follow a state-machine style, which can run on EVM while is not efficient, this PR tries to implement algorithm in this article
https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2
to convert CFG to structured control flow.
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
UPBL=1 cargo test
make sure that the generated YUL codes compile