Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

xusheng6
Copy link
Contributor

@xusheng6 xusheng6 commented Dec 2, 2024

This is part of the effort to optimize the binja extractor performance (#1414). As I mentioned in #2509 (comment), one of the outstanding issue that drags down the binja extractor performance is the re-generation of the IL functions during feature extraction. Those are, however, necessary to ensure the most accurate results.

That said, I found that we can actually just retrieve a single LLIL instruction instead of requesting the entire IL from the function. Getting an LLIL instruction is extremely fast compared to getting the entire IL function and then take the particular instruction from it.

Here is what I am getting as a difference:

  1. For a small file (321338196a46b600ea330fc5d98d0699.exe_, 486 KB in size), the feature extracting time (excluding the initial analysis time) is down from 45 seconds to 32 seconds
  2. For a large file (2f7f5fb5de175e770d7eae87666f9831.elf_, 4.1 MB in size), the feature extracting time is down from 15 minutes to 5 minutes. Which is 300% performance improvement! Apparently the IL regeneration issue becomes more severe as the file grows bigger.

With the change, the extractor is strictly accessing the functions in a sequential order and they never request the IL of a different function, so there is no regeneration of the IL. I also tested the MLIL basic block things, and it makes no noticeable performance gain even if I completely disable the stack string check. In this sense, there is not much motivation to chase after that part.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@github-actions github-actions bot dismissed their stale review December 2, 2024 09:07

CHANGELOG updated or no update needed, thanks! 😄

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the improvements look good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants