binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

xusheng6 · 2024-12-02T09:01:37Z

This is part of the effort to optimize the binja extractor performance (#1414). As I mentioned in #2509 (comment), one of the outstanding issue that drags down the binja extractor performance is the re-generation of the IL functions during feature extraction. Those are, however, necessary to ensure the most accurate results.

That said, I found that we can actually just retrieve a single LLIL instruction instead of requesting the entire IL from the function. Getting an LLIL instruction is extremely fast compared to getting the entire IL function and then take the particular instruction from it.

Here is what I am getting as a difference:

For a small file (321338196a46b600ea330fc5d98d0699.exe_, 486 KB in size), the feature extracting time (excluding the initial analysis time) is down from 45 seconds to 32 seconds
For a large file (2f7f5fb5de175e770d7eae87666f9831.elf_, 4.1 MB in size), the feature extracting time is down from 15 minutes to 5 minutes. Which is 300% performance improvement! Apparently the IL regeneration issue becomes more severe as the file grows bigger.

With the change, the extractor is strictly accessing the functions in a sequential order and they never request the IL of a different function, so there is no regeneration of the IL. I also tested the MLIL basic block things, and it makes no noticeable performance gain even if I completely disable the stack string check. In this sense, there is not much motivation to chase after that part.

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

CHANGELOG updated or no update needed, thanks! 😄

…tire IL function

mr-tz

Nice, the improvements look good to me.

github-actions bot previously requested changes Dec 2, 2024

View reviewed changes

xusheng6 force-pushed the fix_llil_access branch from f2e11ed to 6ff70da Compare December 2, 2024 09:07

binja: retrieve the LLIL instruction itself without requesting the en…

b6763ac

…tire IL function

xusheng6 force-pushed the fix_llil_access branch from 6ff70da to b6763ac Compare December 2, 2024 09:11

mr-tz approved these changes Dec 2, 2024

View reviewed changes

williballenthin approved these changes Dec 2, 2024

View reviewed changes

mr-tz merged commit abe8084 into mandiant:master Dec 2, 2024
28 checks passed

xusheng6 deleted the fix_llil_access branch December 2, 2024 15:33

This was referenced Dec 3, 2024

binja: compute_static_layout causes the regeneration of the IL fucntion due to get_basic_blocks requests the MLIL #2516

Closed

binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

xusheng6 commented Dec 2, 2024 •

edited

Loading

github-actions bot left a comment

mr-tz left a comment

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

Conversation

xusheng6 commented Dec 2, 2024 • edited Loading

Checklist

github-actions bot left a comment

Choose a reason for hiding this comment

mr-tz left a comment

Choose a reason for hiding this comment

xusheng6 commented Dec 2, 2024 •

edited

Loading