On-Demand Function Analysis is Triggering Time and Update Count Limits #6171

xusheng6 · 2024-11-21T07:33:23Z

While working on mandiant/capa#2406 (comment), I noticed that the MLIL of function 0x467464 can normally be retrieved after a few seconds. But when binja was used in capa, it cannot retrieve the IL, thus leading to a crash.

Normal run:

>>> from binaryninja import *
>>> bv = binaryninja.load('/Users/xusheng/Downloads/112f9f0e8d349858a80dd8c14190e620.exe_.')
>>> func = bv.get_function_at(0x467464)
>>> func
<func: x86@0x467464>
>>> func.mlil
<MediumLevelILFunction: x86@0x467464>

It does take func.mlil a few seconds to run, but everything works as expected. We need to see why it is not working in the case of capa

Related to #6170

The text was updated successfully, but these errors were encountered:

xusheng6 · 2024-11-21T07:33:44Z

112f9f0e8d349858a80dd8c14190e620.exe_.zip

xusheng6 · 2024-11-21T08:22:32Z

Also see 0x8091b80 in
b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_.zip

xusheng6 · 2024-11-21T09:11:33Z

capa stack trace:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/xusheng/capa/capa/main.py", line 1103, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/xusheng/capa/capa/main.py", line 994, in main
    capabilities, counts = find_capabilities(rules, extractor, disable_progress=args.quiet)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xusheng/capa/capa/capabilities/common.py", line 75, in find_capabilities
    return find_static_capabilities(ruleset, extractor, disable_progress=disable_progress, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xusheng/capa/capa/capabilities/static.py", line 168, in find_static_capabilities
    function_matches, bb_matches, insn_matches, feature_count = find_code_capabilities(ruleset, extractor, f)
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xusheng/capa/capa/capabilities/static.py", line 114, in find_code_capabilities
    for bb in extractor.get_basic_blocks(fh):
  File "/Users/xusheng/capa/capa/features/extractors/binja/extractor.py", line 58, in get_basic_blocks
    for mlil_bb in f.mlil.basic_blocks:
                   ^^^^^^
  File "/Applications/Binary Ninja.app/Contents/Resources/python/binaryninja/function.py", line 1039, in mlil
    raise ILException(f"Medium level IL was not loaded for {self!r}")
binaryninja.exceptions.ILException: Medium level IL was not loaded for <func: x86@0x8082d40>

And the Python script that works:

from binaryninja import *
bv = binaryninja.load('/Users/xusheng/Downloads/b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_.bndb')
func = bv.get_function_at(0x8082d40)
print(func.mlil)

xusheng6 · 2024-11-21T10:50:59Z

I checked the analysis skip reason for the function is ExceedFunctionAnalysisTimeSkipReason

But it still does NOT look right, because the analysis of that function takes like 2 seconds, there is no way the 20 seconds default time can be exceeded.

Update: I printed the analysis time and it is indeed questionable:

analysis skipped: 3, analysis info: {'Total': 0.002902333}
analysis skipped: 3, analysis info: {'Total': 0.0081085}

The analysis time of the two skipped functions are all very short

For comparison, I wrote a small script that iterates over all functions and retrieve the MLIL of each function. It took roughly 0.7 seconds to generate the MLIL for function 0x8082d40

xusheng6 · 2024-11-21T13:33:51Z

I made some interesting discovery in mandiant/capa#2402 (comment). However, that still does not explain why the analysis could time-out on a not-so-challenging function at all

xusheng6 · 2024-11-22T05:53:39Z

OK so I finally figured out what is happening here. So we are only resetting the function analysis time if the analysis is a user update. If it is always auto-update, then we never reset the timer and the time just gets added up, eventually exceeding the 20 seconds threshold. Here is a brief explanation of what happens:

The script retrieves the IL of function 0x8082d40
It took 0.7 seconds to update the analysis and generate the IL
The IL of function 0x8082d40 is cached
The script requests the IL of some other 60 functions, exceeding the number of analysis.limits.cacheSize
The cached IL of function 0x8082d40 is discarded
The script retrieves the IL of function 0x8082d40 again
It took another 0.7 seconds to update the analysis and generate the IL. The two times are added up
Repeat the process many times until the total analysis time of 0x8082d40 is larger than 20 seconds
The analysis time of 0x8082d40 exceeds analysis.limits.maxFunctionAnalysisTime
The analysis bails out with a reason ExceedFunctionAnalysisTimeSkipReason

I understand we are probably only resetting the total analysis time on user updates for a good reason, and this cached-related repetitive generation of the IL is probably never encountered before

bpotchik · 2024-11-24T17:22:30Z

@xusheng6 thanks for discovering this and providing the details. The issue required a slightly different fix, and I added a unit test for this scenario as well.

Fixed in 4.3.6482.

xusheng6 · 2024-11-25T03:21:25Z

@xusheng6 thanks for discovering this and providing the details. The issue required a slightly different fix, and I added a unit test for this scenario as well.

Fixed in 4.3.6482.

I verify the fix works for capa, thx for fixing in so fast!

xusheng6 changed the title ~~IL function cannot be retrived when it normally can~~ IL function cannot be retrived when it normally can be Nov 21, 2024

xusheng6 mentioned this issue Nov 22, 2024

binary ninja: optimize feature extraction mandiant/capa#2402

Open

xusheng6 self-assigned this Nov 22, 2024

xusheng6 changed the title ~~IL function cannot be retrived when it normally can be~~ Analysis time of a function is only added up and never reset if the analysis update is not user update Nov 22, 2024

xusheng6 added this to the Gallifrey milestone Nov 22, 2024

bpotchik changed the title ~~Analysis time of a function is only added up and never reset if the analysis update is not user update~~ On-Demand Function Analysis is Triggering Time and Update Count Limits Nov 24, 2024

bpotchik self-assigned this Nov 24, 2024

bpotchik added the Component: Core Issue needs changes to the core label Nov 24, 2024

bpotchik closed this as completed Nov 24, 2024

This was referenced Nov 25, 2024

Crash when analyzing large file with binary ninja backend becuase the IL function is not available mandiant/capa#2249

Open

triage binary ninja backend failures mandiant/capa#2406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-Demand Function Analysis is Triggering Time and Update Count Limits #6171

On-Demand Function Analysis is Triggering Time and Update Count Limits #6171

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 21, 2024 •

edited

Loading

xusheng6 commented Nov 21, 2024 •

edited

Loading

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 22, 2024

bpotchik commented Nov 24, 2024

xusheng6 commented Nov 25, 2024

On-Demand Function Analysis is Triggering Time and Update Count Limits #6171

On-Demand Function Analysis is Triggering Time and Update Count Limits #6171

Comments

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 21, 2024 • edited Loading

xusheng6 commented Nov 21, 2024 • edited Loading

xusheng6 commented Nov 21, 2024

xusheng6 commented Nov 22, 2024

bpotchik commented Nov 24, 2024

xusheng6 commented Nov 25, 2024

xusheng6 commented Nov 21, 2024 •

edited

Loading

xusheng6 commented Nov 21, 2024 •

edited

Loading