binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is part of the effort to optimize the binja extractor performance (#1414). As I mentioned in #2509 (comment), one of the outstanding issue that drags down the binja extractor performance is the re-generation of the IL functions during feature extraction. Those are, however, necessary to ensure the most accurate results.
That said, I found that we can actually just retrieve a single LLIL instruction instead of requesting the entire IL from the function. Getting an LLIL instruction is extremely fast compared to getting the entire IL function and then take the particular instruction from it.
Here is what I am getting as a difference:
With the change, the extractor is strictly accessing the functions in a sequential order and they never request the IL of a different function, so there is no regeneration of the IL. I also tested the MLIL basic block things, and it makes no noticeable performance gain even if I completely disable the stack string check. In this sense, there is not much motivation to chase after that part.
Checklist