Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binja: compute_static_layout causes the regeneration of the IL fucntion due to get_basic_blocks requests the MLIL #2516

Closed
xusheng6 opened this issue Dec 3, 2024 · 6 comments · Fixed by #2523

Comments

@xusheng6
Copy link
Contributor

xusheng6 commented Dec 3, 2024

See

def get_basic_blocks(self, fh: FunctionHandle) -> Iterator[BBHandle]:

I just notice that the compute_static_layout runs slower than expected. Upon further investigation, I noticed it collects the basic blocks of the functions. And in the binja extractor, when we return the list of basic blocks, we request the MLIL of the function to pair the disassembly bbl and MLIL bbl.

I think this is actually worse than the previous implementation, i.e., before e9d4a23. although before that commit, there is actually an O(n^2) lookup on the MLIL basic blocks. However, thanks to the recent effort to optimize the performance, it is discovered that the MLIL is not causing a lot of issues. Also, since n is the number of the basic blocks, it should usually just be fine.

Better still, I think there is a different way to do it. We can just do the stack string detection on the function level, rather than at the basic block level. In this way, we would only need to enumerate the MLIL basic blocks once. Do you think this would work? @williballenthin

@xusheng6
Copy link
Contributor Author

xusheng6 commented Dec 3, 2024

This unfortunately would not further improve the feature extraction time reported in #2511 (comment), since I only counted the time used on the feature extraction itself, and not the static layout computation. We should still definitely fix this because the time is felt by the user for sure

@xusheng6
Copy link
Contributor Author

xusheng6 commented Dec 3, 2024

This also seems to be related to the crash in #2406 (comment). Though I am yet to understand it

@xusheng6
Copy link
Contributor Author

xusheng6 commented Dec 3, 2024

Also @mr-tz, what do you think about what my proposed fix? All other backends seem to be detecting stack strings at the basic block level, but I do not see any reason that it must be done there

@williballenthin
Copy link
Collaborator

We can just do the stack string detection on the function level, rather than at the basic block level.

I think this works just fine!

@williballenthin
Copy link
Collaborator

capa assumes that enumerating functions and basic blocks is cheap, which isn't quite true when Binary Ninja MLIL is requested. So, I hope we can move any heavy operations out of get_functions/get_basic_blocks.

I think moving stack string detection to the function-level should hopefully solve this.

@xusheng6
Copy link
Contributor Author

xusheng6 commented Dec 3, 2024

capa assumes that enumerating functions and basic blocks is cheap, which isn't quite true when Binary Ninja MLIL is requested. So, I hope we can move any heavy operations out of get_functions/get_basic_blocks.

I think moving stack string detection to the function-level should hopefully solve this.

Yup, that is what I am planning to do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants