-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binja: compute_static_layout
causes the regeneration of the IL fucntion due to get_basic_blocks
requests the MLIL
#2516
Comments
This unfortunately would not further improve the feature extraction time reported in #2511 (comment), since I only counted the time used on the feature extraction itself, and not the static layout computation. We should still definitely fix this because the time is felt by the user for sure |
This also seems to be related to the crash in #2406 (comment). Though I am yet to understand it |
Also @mr-tz, what do you think about what my proposed fix? All other backends seem to be detecting stack strings at the basic block level, but I do not see any reason that it must be done there |
I think this works just fine! |
capa assumes that enumerating functions and basic blocks is cheap, which isn't quite true when Binary Ninja MLIL is requested. So, I hope we can move any heavy operations out of get_functions/get_basic_blocks. I think moving stack string detection to the function-level should hopefully solve this. |
Yup, that is what I am planning to do |
See
capa/capa/features/extractors/binja/extractor.py
Line 55 in 688841f
I just notice that the
compute_static_layout
runs slower than expected. Upon further investigation, I noticed it collects the basic blocks of the functions. And in the binja extractor, when we return the list of basic blocks, we request the MLIL of the function to pair the disassembly bbl and MLIL bbl.I think this is actually worse than the previous implementation, i.e., before e9d4a23. although before that commit, there is actually an O(n^2) lookup on the MLIL basic blocks. However, thanks to the recent effort to optimize the performance, it is discovered that the MLIL is not causing a lot of issues. Also, since n is the number of the basic blocks, it should usually just be fine.
Better still, I think there is a different way to do it. We can just do the stack string detection on the function level, rather than at the basic block level. In this way, we would only need to enumerate the MLIL basic blocks once. Do you think this would work? @williballenthin
The text was updated successfully, but these errors were encountered: