binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

williballenthin · 2024-11-27T12:07:26Z

This PR does two things for the Binary Ninja backend:

build the call graph up front, which is more cache friendly, and
fetch instruction LLIL once and provide it via the instruction handler context.

With these two changes, against 321338196a46b600ea330fc5d98d0699.exe_ capa analysis time drops from 232s to 191s, a savings of around 18%.

@xusheng6 please review.

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

for cache friendliness. see #2402

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

capa/features/extractors/binja/extractor.py

mr-tz

LGTM, interested to see @xusheng6 thoughts and feedback

williballenthin · 2024-11-27T12:25:21Z

capa/features/extractors/binja/insn.py


    results: list[tuple[Any[Number, OperandNumber], Address]] = []

+    # TODO: try to move this out of line


these function objects get created for each instruction in the program, which may be millions of times, so this may be expensive. we should profile and see if it makes any difference to pull the inner routine into a function thats defined a single time.

williballenthin · 2024-11-27T12:29:38Z

capa/features/extractors/binja/extractor.py

+
+        f: Function
+        for f in self.bv.functions:
+            for caller in f.callers:


function.callers doesn't filter the references by calls, just any code reference to this function. maybe this is ok. we should investigate if this loss in precision is meaningful or not.

williballenthin · 2024-11-27T12:30:50Z

there's also a pending opportunity to reduce the number of mlil objects fetched, which may save another 15s (8%) or so. we read mlil to compute basic blocks during feature extraction, and then again to compute the file layout.

williballenthin · 2024-11-27T16:04:54Z

there might be something else we can do around enumerating instructions, which takes around 14s (7%), but otherwise there's not many obvious hotspots:

xusheng6 · 2024-11-28T09:51:41Z

Thx for the PR! I will have a look at it ASAP

xusheng6 · 2024-11-29T09:27:36Z

@williballenthin Below are some of my recent thoughts, basically I checked the places where I need to access the IL of the functions:

In is_stub_function, the code checks whether a function is a stub function to another function or imported API by examining the LLIL. The case can be seen in al-khaser_x64.exe_ at 0x14004b532.

This should really be just implemented in binja's core analysis. In other words, binja should detect this is a stub function and treat it as such, so that the feature extractor does not need to bother with that. (Update: I just found we already have Vector35/binaryninja-api#111)

Also, from your perspective, how often do you see a debug build like this when you analyze malware? If this is often enough, maybe we should actually prioritize this internally.

Even before we get to a proper fix, we could first do a pass on all of the functions and calculate and cache the result of is_stub_function

In extract_function_calls_to, I had to check the LLIL instruction that actually makes the xref to exclude some false positives. In fact, the caller_sites property of a Function object almost returns the things it the exactor wants, but I noticed a false positive during development. For example, if another function has some code like push func1, then the caller_sites of func1 will also include the push func1 instruction, although it is not technically a call to func1.

This is actually a known issue in Vector35/binaryninja-api#3722. However, adding the extra check in the implementation of the property does not really help much, since it just moved the calculation from in capa to binja's API.

A possible workaround is to relax the restriction here, and just allow things like non-calls to be included in it. The reason is, even if another funciton does not call this function explicitly, having a xref to the start of the function still means they have very close relationship, and even potentially using it as a callback or sth. If we want this, we will need to change the unit test for binja a bit to make it pass again

In get_basic_blocks, I am matching the disassembly basic block with the MLIL basic block. The MLIL basic block is later used for stack string detection. I know this is probably where the worst part of the binja extractor xD. In fact, I can directly know the existence of the stack string at the function level by checking things like builtin_strcpy, etc. It is in fact a detour to match the MLIL bbl with the disassembly bbl, only because capa wants that info at the basic block level.

I am curious if we can change that a bit to be more flexible. I know other backends probably cannot tell this at the function level, but for binja, it is very straightforward

mr-tz · 2024-11-29T10:04:00Z

Also, from your perspective, how often do you see a debug build like this when you analyze malware?

It's not uncommon that we see this for samples we analyze manually - I'd guess 10-20%. However, looking at larger scale analysis (e.g. from last week) it's more around 1% (using the capa debug build match results).

xusheng6 · 2024-11-29T10:57:36Z

Also, from your perspective, how often do you see a debug build like this when you analyze malware?

It's not uncommon that we see this for samples we analyze manually - I'd guess 10-20%. However, looking at larger scale analysis (e.g. from last week) it's more around 1% (using the capa debug build match results).

Thx for the valuable info! Also, are there any cases where the stub/thunk function take another form besides the normal one where it is just a jmp to the real function?

Also, I just realized that I can actually directly get the LLIL of a particular instruction at a given address without first retrieving the IL function it belongs to. This means I no longer need to retrieve the IL of any other functions during the analysis of one function. This could bring drastic performance!

Also, the stack string detection does not have to be done using MLIL. Previously, I already had an implementation of it without using the IL (just like what other backend does). This could even make the binja extractor work without accessing the IL of any function, which could make things even faster. For this, I think we can gate it behind a setting.

mr-tz · 2024-12-03T12:35:07Z

did the other changes supersede this PR?

williballenthin · 2024-12-03T12:49:23Z

i will rebase and see if there's any reason to continue these optimizations. without data, i'm not yet sure.

xusheng6 · 2024-12-03T12:58:25Z

did the other changes supersede this PR?

I do not think #2511 alone can do it, but the combination of these three can probably do:

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511
binja: get_instruction should attach the list of associated LLIL instructions to the instruction object #2520
binja: compute_static_layout causes the regeneration of the IL fucntion due to get_basic_blocks requests the MLIL #2516

Even with all these done, it is still meaningful to see if this PR can push the performance further. However, I recommend @williballenthin to only start working on it after I have finished all changes from my side

xusheng6 · 2024-12-04T09:11:58Z

Hi @williballenthin , finally got some to look at your changes! It seems to me that binja: provide llil to instruction handlers via ctx implements the idea described in #2520 and it should also hopefully fix #2517. Could you please rebase that commit on top of the latest master? You can keep 999f91b and a909d02 as well if you would like to since I feel like they will be quite useful. Besides, you may wish to create a new branch because despite I think the other 3 commits are no longer useful on top of the latest master, we would better still keep it in case we want it later

williballenthin added 4 commits November 27, 2024 10:59

perf: add timing ctxmgr for logging duration of a block

999f91b

loader: show duration of binary ninja loading

a909d02

binja: compute call graph up front

319dbfe

for cache friendliness. see #2402

binja: provide llil to instruction handlers via ctx

73f56f5

williballenthin requested a review from mr-tz November 27, 2024 12:07

williballenthin added enhancement New feature or request binary-ninja performance Related to capa's performance labels Nov 27, 2024

github-actions bot requested changes Nov 27, 2024

View reviewed changes

williballenthin commented Nov 27, 2024

View reviewed changes

capa/features/extractors/binja/extractor.py Show resolved Hide resolved

mr-tz approved these changes Nov 27, 2024

View reviewed changes

binary ninja: use function.callers to compute call graph

61e1684

williballenthin commented Nov 27, 2024

View reviewed changes

binary ninja: fix computation of call graph

99daa63

williballenthin commented Nov 27, 2024

View reviewed changes

mr-tz assigned xusheng6 Nov 28, 2024

xusheng6 mentioned this pull request Dec 2, 2024

binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

williballenthin commented Nov 27, 2024 •

edited

Loading

github-actions bot left a comment

mr-tz left a comment

williballenthin Nov 27, 2024

williballenthin Nov 27, 2024

williballenthin commented Nov 27, 2024 •

edited

Loading

williballenthin commented Nov 27, 2024

xusheng6 commented Nov 28, 2024

xusheng6 commented Nov 29, 2024 •

edited

Loading

mr-tz commented Nov 29, 2024

xusheng6 commented Nov 29, 2024

mr-tz commented Dec 3, 2024

williballenthin commented Dec 3, 2024

xusheng6 commented Dec 3, 2024

xusheng6 commented Dec 4, 2024 •

edited

Loading


		results: list[tuple[Any[Number, OperandNumber], Address]] = []

		# TODO: try to move this out of line

binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

Are you sure you want to change the base?

binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

Conversation

williballenthin commented Nov 27, 2024 • edited Loading

Checklist

github-actions bot left a comment

Choose a reason for hiding this comment

mr-tz left a comment

Choose a reason for hiding this comment

williballenthin Nov 27, 2024

Choose a reason for hiding this comment

williballenthin Nov 27, 2024

Choose a reason for hiding this comment

williballenthin commented Nov 27, 2024 • edited Loading

williballenthin commented Nov 27, 2024

xusheng6 commented Nov 28, 2024

xusheng6 commented Nov 29, 2024 • edited Loading

mr-tz commented Nov 29, 2024

xusheng6 commented Nov 29, 2024

mr-tz commented Dec 3, 2024

williballenthin commented Dec 3, 2024

xusheng6 commented Dec 3, 2024

xusheng6 commented Dec 4, 2024 • edited Loading

williballenthin commented Nov 27, 2024 •

edited

Loading

williballenthin commented Nov 27, 2024 •

edited

Loading

xusheng6 commented Nov 29, 2024 •

edited

Loading

xusheng6 commented Dec 4, 2024 •

edited

Loading