Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary ninja: optimize the order of computing LLIL to partially address #2402 #2509

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

williballenthin
Copy link
Collaborator

@williballenthin williballenthin commented Nov 27, 2024

ref #2402

This PR does two things for the Binary Ninja backend:

  1. build the call graph up front, which is more cache friendly, and
  2. fetch instruction LLIL once and provide it via the instruction handler context.

With these two changes, against 321338196a46b600ea330fc5d98d0699.exe_ capa analysis time drops from 232s to 191s, a savings of around 18%.

@xusheng6 please review.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@williballenthin williballenthin added enhancement New feature or request binary-ninja performance Related to capa's performance labels Nov 27, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, interested to see @xusheng6 thoughts and feedback


results: list[tuple[Any[Number, OperandNumber], Address]] = []

# TODO: try to move this out of line
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these function objects get created for each instruction in the program, which may be millions of times, so this may be expensive. we should profile and see if it makes any difference to pull the inner routine into a function thats defined a single time.


f: Function
for f in self.bv.functions:
for caller in f.callers:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function.callers doesn't filter the references by calls, just any code reference to this function. maybe this is ok. we should investigate if this loss in precision is meaningful or not.

@williballenthin
Copy link
Collaborator Author

williballenthin commented Nov 27, 2024

there's also a pending opportunity to reduce the number of mlil objects fetched, which may save another 15s (8%) or so. we read mlil to compute basic blocks during feature extraction, and then again to compute the file layout.

image

@williballenthin
Copy link
Collaborator Author

there might be something else we can do around enumerating instructions, which takes around 14s (7%), but otherwise there's not many obvious hotspots:

image

@xusheng6
Copy link
Contributor

Thx for the PR! I will have a look at it ASAP

@xusheng6
Copy link
Contributor

xusheng6 commented Nov 29, 2024

@williballenthin Below are some of my recent thoughts, basically I checked the places where I need to access the IL of the functions:

  1. In is_stub_function, the code checks whether a function is a stub function to another function or imported API by examining the LLIL. The case can be seen in al-khaser_x64.exe_ at 0x14004b532.

Screenshot 2024-11-29 at 5 11 58 PM

This should really be just implemented in binja's core analysis. In other words, binja should detect this is a stub function and treat it as such, so that the feature extractor does not need to bother with that. (Update: I just found we already have Vector35/binaryninja-api#111)

Also, from your perspective, how often do you see a debug build like this when you analyze malware? If this is often enough, maybe we should actually prioritize this internally.

Even before we get to a proper fix, we could first do a pass on all of the functions and calculate and cache the result of is_stub_function

  1. In extract_function_calls_to, I had to check the LLIL instruction that actually makes the xref to exclude some false positives. In fact, the caller_sites property of a Function object almost returns the things it the exactor wants, but I noticed a false positive during development. For example, if another function has some code like push func1, then the caller_sites of func1 will also include the push func1 instruction, although it is not technically a call to func1.

This is actually a known issue in Vector35/binaryninja-api#3722. However, adding the extra check in the implementation of the property does not really help much, since it just moved the calculation from in capa to binja's API.

A possible workaround is to relax the restriction here, and just allow things like non-calls to be included in it. The reason is, even if another funciton does not call this function explicitly, having a xref to the start of the function still means they have very close relationship, and even potentially using it as a callback or sth. If we want this, we will need to change the unit test for binja a bit to make it pass again

  1. In get_basic_blocks, I am matching the disassembly basic block with the MLIL basic block. The MLIL basic block is later used for stack string detection. I know this is probably where the worst part of the binja extractor xD. In fact, I can directly know the existence of the stack string at the function level by checking things like builtin_strcpy, etc. It is in fact a detour to match the MLIL bbl with the disassembly bbl, only because capa wants that info at the basic block level.

I am curious if we can change that a bit to be more flexible. I know other backends probably cannot tell this at the function level, but for binja, it is very straightforward

@mr-tz
Copy link
Collaborator

mr-tz commented Nov 29, 2024

Also, from your perspective, how often do you see a debug build like this when you analyze malware?

It's not uncommon that we see this for samples we analyze manually - I'd guess 10-20%. However, looking at larger scale analysis (e.g. from last week) it's more around 1% (using the capa debug build match results).

@xusheng6
Copy link
Contributor

Also, from your perspective, how often do you see a debug build like this when you analyze malware?

It's not uncommon that we see this for samples we analyze manually - I'd guess 10-20%. However, looking at larger scale analysis (e.g. from last week) it's more around 1% (using the capa debug build match results).

Thx for the valuable info! Also, are there any cases where the stub/thunk function take another form besides the normal one where it is just a jmp to the real function?

Also, I just realized that I can actually directly get the LLIL of a particular instruction at a given address without first retrieving the IL function it belongs to. This means I no longer need to retrieve the IL of any other functions during the analysis of one function. This could bring drastic performance!

Also, the stack string detection does not have to be done using MLIL. Previously, I already had an implementation of it without using the IL (just like what other backend does). This could even make the binja extractor work without accessing the IL of any function, which could make things even faster. For this, I think we can gate it behind a setting.

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 3, 2024

did the other changes supersede this PR?

@williballenthin
Copy link
Collaborator Author

i will rebase and see if there's any reason to continue these optimizations. without data, i'm not yet sure.

@xusheng6
Copy link
Contributor

xusheng6 commented Dec 3, 2024

did the other changes supersede this PR?

I do not think #2511 alone can do it, but the combination of these three can probably do:

  1. binja: retrieve the LLIL instruction itself without requesting the entire IL function #2511
  2. binja: get_instruction should attach the list of associated LLIL instructions to the instruction object #2520
  3. binja: compute_static_layout causes the regeneration of the IL fucntion due to get_basic_blocks requests the MLIL #2516

Even with all these done, it is still meaningful to see if this PR can push the performance further. However, I recommend @williballenthin to only start working on it after I have finished all changes from my side

@xusheng6
Copy link
Contributor

xusheng6 commented Dec 4, 2024

Hi @williballenthin , finally got some to look at your changes! It seems to me that binja: provide llil to instruction handlers via ctx implements the idea described in #2520 and it should also hopefully fix #2517. Could you please rebase that commit on top of the latest master? You can keep 999f91b and a909d02 as well if you would like to since I feel like they will be quite useful. Besides, you may wish to create a new branch because despite I think the other 3 commits are no longer useful on top of the latest master, we would better still keep it in case we want it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary-ninja enhancement New feature or request performance Related to capa's performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants