Skip to content

Commit

Permalink
merge master
Browse files Browse the repository at this point in the history
  • Loading branch information
williballenthin committed Oct 17, 2023
2 parents 9a66c26 + 40d9587 commit 182a986
Show file tree
Hide file tree
Showing 36 changed files with 2,329 additions and 70 deletions.
3 changes: 3 additions & 0 deletions .github/mypy/mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,6 @@ ignore_missing_imports = True

[mypy-netnode.*]
ignore_missing_imports = True

[mypy-ghidra.*]
ignore_missing_imports = True
2 changes: 1 addition & 1 deletion .github/pyinstaller/pyinstaller.spec
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ exe = EXE(
name="capa",
icon="logo.ico",
debug=False,
strip=None,
strip=False,
upx=True,
console=True,
)
Expand Down
59 changes: 59 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,62 @@ jobs:
env:
BN_LICENSE: ${{ secrets.BN_LICENSE }}
run: pytest -v tests/test_binja_features.py # explicitly refer to the binja tests for performance. other tests run above.

ghidra-tests:
name: Ghidra tests for ${{ matrix.python-version }}
runs-on: ubuntu-20.04
needs: [code_style, rule_linter]
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.11"]
java-version: ["17"]
gradle-version: ["7.3"]
ghidra-version: ["10.3"]
public-version: ["PUBLIC_20230510"] # for ghidra releases
jep-version: ["4.1.1"]
ghidrathon-version: ["3.0.0"]
steps:
- name: Checkout capa with submodules
uses: actions/checkout@ac593985615ec2ede58e132d2e21d2b1cbd6127c # v3.3.0
with:
submodules: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@d27e3f3d7c64b4bbf8e4abfb9b63b83e846e0435 # v4.5.0
with:
python-version: ${{ matrix.python-version }}
- name: Set up Java ${{ matrix.java-version }}
uses: actions/setup-java@5ffc13f4174014e2d4d4572b3d74c3fa61aeb2c2 # v3
with:
distribution: 'temurin'
java-version: ${{ matrix.java-version }}
- name: Set up Gradle ${{ matrix.gradle-version }}
uses: gradle/gradle-build-action@40b6781dcdec2762ad36556682ac74e31030cfe2 # v2.5.1
with:
gradle-version: ${{ matrix.gradle-version }}
- name: Install Jep ${{ matrix.jep-version }}
run : pip install jep==${{ matrix.jep-version }}
- name: Install Ghidra ${{ matrix.ghidra-version }}
run: |
mkdir ./.github/ghidra
wget "https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_${{ matrix.ghidra-version }}_build/ghidra_${{ matrix.ghidra-version }}_${{ matrix.public-version }}.zip" -O ./.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip
unzip .github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip -d .github/ghidra/
- name: Install Ghidrathon
run : |
mkdir ./.github/ghidrathon
curl -o ./.github/ghidrathon/ghidrathon-${{ matrix.ghidrathon-version }}.zip "https://codeload.github.com/mandiant/Ghidrathon/zip/refs/tags/v${{ matrix.ghidrathon-version }}"
unzip .github/ghidrathon/ghidrathon-${{ matrix.ghidrathon-version }}.zip -d .github/ghidrathon/
gradle -p ./.github/ghidrathon/Ghidrathon-${{ matrix.ghidrathon-version }}/ -PGHIDRA_INSTALL_DIR=$(pwd)/.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC
unzip .github/ghidrathon/Ghidrathon-${{ matrix.ghidrathon-version }}/dist/*.zip -d .github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC/Ghidra/Extensions
- name: Install pyyaml
run: sudo apt-get install -y libyaml-dev
- name: Install capa
run: pip install -e .[dev]
- name: Run tests
run: |
mkdir ./.github/ghidra/project
.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC/support/analyzeHeadless .github/ghidra/project ghidra_test -Import ./tests/data/mimikatz.exe_ -ScriptPath ./tests/ -PostScript test_ghidra_features.py > ../output.log
cat ../output.log
exit_code=$(cat ../output.log | grep exit | awk '{print $NF}')
exit $exit_code
31 changes: 29 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,43 @@
- add call scope #771 @yelhamer
- add process scope for the dynamic analysis flavor #1517 @yelhamer
- Add thread scope for the dynamic analysis flavor #1517 @yelhamer
- ghidra: add Ghidra feature extractor and supporting code #1770 @colton-gabertan
- ghidra: add entry script helping users run capa against a loaded Ghidra database #1767 @mike-hunhoff
- binja: add support for forwarded exports #1646 @xusheng6
- binja: add support for symtab names #1504 @xusheng6

### Breaking Changes

- remove the `SCOPE_*` constants in favor of the `Scope` enum #1764 @williballenthin

### New Rules (0)

### New Rules (19)

- nursery/get-ntoskrnl-base-address @mr-tz
- host-interaction/network/connectivity/set-tcp-connection-state @johnk3r
- nursery/capture-process-snapshot-data @mr-tz
- collection/network/capture-packets-using-sharppcap [email protected]
- nursery/communicate-with-kernel-module-via-netlink-socket-on-linux [email protected]
- nursery/get-current-pid-on-linux [email protected]
- nursery/get-file-system-information-on-linux [email protected]
- nursery/get-password-database-entry-on-linux [email protected]
- nursery/mark-thread-detached-on-linux [email protected]
- nursery/persist-via-gnome-autostart-on-linux [email protected]
- nursery/set-thread-name-on-linux [email protected]
- load-code/dotnet/load-windows-common-language-runtime [email protected] [email protected] [email protected]
- nursery/log-keystrokes-via-input-method-manager @mr-tz
- nursery/encrypt-data-using-rc4-via-systemfunction032 [email protected]
- nursery/add-value-to-global-atom-table @mr-tz
- nursery/enumerate-processes-that-use-resource @Ana06
- host-interaction/process/inject/allocate-or-change-rwx-memory @mr-tz
- lib/allocate-or-change-rw-memory [email protected] @mr-tz
- lib/change-memory-protection @mr-tz
-

### Bug Fixes
- ghidra: fix ints_to_bytes performance #1761 @mike-hunhoff
- binja: improve function call site detection @xusheng6
- binja: use binaryninja.load to open files @xusheng6
- binja: bump binja version to 3.5 #1789 @xusheng6

### capa explorer IDA Pro plugin

Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/flare-capa)](https://pypi.org/project/flare-capa)
[![Last release](https://img.shields.io/github/v/release/mandiant/capa)](https://github.com/mandiant/capa/releases)
[![Number of rules](https://img.shields.io/badge/rules-831-blue.svg)](https://github.com/mandiant/capa-rules)
[![Number of rules](https://img.shields.io/badge/rules-847-blue.svg)](https://github.com/mandiant/capa-rules)
[![CI status](https://github.com/mandiant/capa/workflows/CI/badge.svg)](https://github.com/mandiant/capa/actions?query=workflow%3ACI+event%3Apush+branch%3Amaster)
[![Downloads](https://img.shields.io/github/downloads/mandiant/capa/total)](https://github.com/mandiant/capa/releases)
[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)
Expand Down Expand Up @@ -170,6 +170,8 @@ capa explorer helps you identify interesting areas of a program and build new ca
![capa + IDA Pro integration](https://github.com/mandiant/capa/blob/master/doc/img/explorer_expanded.png)
If you use Ghidra, you can use the Python 3 [Ghidra feature extractor](/capa/ghidra/). This integration enables capa to extract features directly from your Ghidra database, which can help you identify capabilities in programs that you analyze using Ghidra.
# further information
## capa
- [Installation](https://github.com/mandiant/capa/blob/master/doc/installation.md)
Expand Down
40 changes: 30 additions & 10 deletions capa/features/extractors/binja/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from capa.features.file import Export, Import, Section, FunctionName
from capa.features.common import FORMAT_PE, FORMAT_ELF, Format, String, Feature, Characteristic
from capa.features.address import NO_ADDRESS, Address, FileOffsetAddress, AbsoluteVirtualAddress
from capa.features.extractors.binja.helpers import unmangle_c_name
from capa.features.extractors.binja.helpers import read_c_string, unmangle_c_name


def check_segment_for_pe(bv: BinaryView, seg: Segment) -> Iterator[Tuple[int, int]]:
Expand Down Expand Up @@ -82,6 +82,24 @@ def extract_file_export_names(bv: BinaryView) -> Iterator[Tuple[Feature, Address
if name != unmangled_name:
yield Export(unmangled_name), AbsoluteVirtualAddress(sym.address)

for sym in bv.get_symbols_of_type(SymbolType.DataSymbol):
if sym.binding not in [SymbolBinding.GlobalBinding]:
continue

name = sym.short_name
if not name.startswith("__forwarder_name"):
continue

# Due to https://github.com/Vector35/binaryninja-api/issues/4641, in binja version 3.5, the symbol's name
# does not contain the DLL name. As a workaround, we read the C string at the symbol's address, which contains
# both the DLL name and the function name.
# Once the above issue is closed in the next binjs stable release, we can update the code here to use the
# symbol name directly.
name = read_c_string(bv, sym.address, 1024)
forwarded_name = capa.features.extractors.helpers.reformat_forwarded_export_name(name)
yield Export(forwarded_name), AbsoluteVirtualAddress(sym.address)
yield Characteristic("forwarded export"), AbsoluteVirtualAddress(sym.address)


def extract_file_import_names(bv: BinaryView) -> Iterator[Tuple[Feature, Address]]:
"""extract function imports
Expand Down Expand Up @@ -125,15 +143,17 @@ def extract_file_function_names(bv: BinaryView) -> Iterator[Tuple[Feature, Addre
"""
for sym_name in bv.symbols:
for sym in bv.symbols[sym_name]:
if sym.type == SymbolType.LibraryFunctionSymbol:
name = sym.short_name
yield FunctionName(name), sym.address
if name.startswith("_"):
# some linkers may prefix linked routines with a `_` to avoid name collisions.
# extract features for both the mangled and un-mangled representations.
# e.g. `_fwrite` -> `fwrite`
# see: https://stackoverflow.com/a/2628384/87207
yield FunctionName(name[1:]), sym.address
if sym.type not in [SymbolType.LibraryFunctionSymbol, SymbolType.FunctionSymbol]:
continue

name = sym.short_name
yield FunctionName(name), sym.address
if name.startswith("_"):
# some linkers may prefix linked routines with a `_` to avoid name collisions.
# extract features for both the mangled and un-mangled representations.
# e.g. `_fwrite` -> `fwrite`
# see: https://stackoverflow.com/a/2628384/87207
yield FunctionName(name[1:]), sym.address


def extract_file_format(bv: BinaryView) -> Iterator[Tuple[Feature, Address]]:
Expand Down
44 changes: 40 additions & 4 deletions capa/features/extractors/binja/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
# See the License for the specific language governing permissions and limitations under the License.
from typing import Tuple, Iterator

from binaryninja import Function, BinaryView, LowLevelILOperation
from binaryninja import Function, BinaryView, SymbolType, RegisterValueType, LowLevelILOperation

from capa.features.file import FunctionName
from capa.features.common import Feature, Characteristic
from capa.features.address import Address, AbsoluteVirtualAddress
from capa.features.extractors import loops
Expand All @@ -23,13 +24,27 @@ def extract_function_calls_to(fh: FunctionHandle):
# Everything that is a code reference to the current function is considered a caller, which actually includes
# many other references that are NOT a caller. For example, an instruction `push function_start` will also be
# considered a caller to the function
if caller.llil is not None and caller.llil.operation in [
llil = caller.llil
if (llil is None) or llil.operation not in [
LowLevelILOperation.LLIL_CALL,
LowLevelILOperation.LLIL_CALL_STACK_ADJUST,
LowLevelILOperation.LLIL_JUMP,
LowLevelILOperation.LLIL_TAILCALL,
]:
yield Characteristic("calls to"), AbsoluteVirtualAddress(caller.address)
continue

if llil.dest.value.type not in [
RegisterValueType.ImportedAddressValue,
RegisterValueType.ConstantValue,
RegisterValueType.ConstantPointerValue,
]:
continue

address = llil.dest.value.value
if address != func.start:
continue

yield Characteristic("calls to"), AbsoluteVirtualAddress(caller.address)


def extract_function_loop(fh: FunctionHandle):
Expand Down Expand Up @@ -59,10 +74,31 @@ def extract_recursive_call(fh: FunctionHandle):
yield Characteristic("recursive call"), fh.address


def extract_function_name(fh: FunctionHandle):
"""extract function names (e.g., symtab names)"""
func: Function = fh.inner
bv: BinaryView = func.view
if bv is None:
return

for sym in bv.get_symbols(func.start):
if sym.type not in [SymbolType.LibraryFunctionSymbol, SymbolType.FunctionSymbol]:
continue

name = sym.short_name
yield FunctionName(name), sym.address
if name.startswith("_"):
# some linkers may prefix linked routines with a `_` to avoid name collisions.
# extract features for both the mangled and un-mangled representations.
# e.g. `_fwrite` -> `fwrite`
# see: https://stackoverflow.com/a/2628384/87207
yield FunctionName(name[1:]), sym.address


def extract_features(fh: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
for func_handler in FUNCTION_HANDLERS:
for feature, addr in func_handler(fh):
yield feature, addr


FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop, extract_recursive_call)
FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop, extract_recursive_call, extract_function_name)
18 changes: 17 additions & 1 deletion capa/features/extractors/binja/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from typing import List, Callable
from dataclasses import dataclass

from binaryninja import LowLevelILInstruction
from binaryninja import BinaryView, LowLevelILInstruction
from binaryninja.architecture import InstructionTextToken


Expand Down Expand Up @@ -51,3 +51,19 @@ def unmangle_c_name(name: str) -> str:
return match.group(1)

return name


def read_c_string(bv: BinaryView, offset: int, max_len: int) -> str:
s: List[str] = []
while len(s) < max_len:
try:
c = bv.read(offset + len(s), 1)[0]
except Exception:
break

if c == 0:
break

s.append(chr(c))

return "".join(s)
44 changes: 24 additions & 20 deletions capa/features/extractors/binja/insn.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,28 +94,32 @@ def extract_insn_api_features(fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle)
candidate_addrs.append(stub_addr)

for address in candidate_addrs:
sym = func.view.get_symbol_at(address)
if sym is None or sym.type not in [SymbolType.ImportAddressSymbol, SymbolType.ImportedFunctionSymbol]:
continue

sym_name = sym.short_name

lib_name = ""
import_lib = bv.lookup_imported_object_library(sym.address)
if import_lib is not None:
lib_name = import_lib[0].name
if lib_name.endswith(".dll"):
lib_name = lib_name[:-4]
elif lib_name.endswith(".so"):
lib_name = lib_name[:-3]

for name in capa.features.extractors.helpers.generate_symbols(lib_name, sym_name):
yield API(name), ih.address

if sym_name.startswith("_"):
for name in capa.features.extractors.helpers.generate_symbols(lib_name, sym_name[1:]):
for sym in func.view.get_symbols(address):
if sym is None or sym.type not in [
SymbolType.ImportAddressSymbol,
SymbolType.ImportedFunctionSymbol,
SymbolType.FunctionSymbol,
]:
continue

sym_name = sym.short_name

lib_name = ""
import_lib = bv.lookup_imported_object_library(sym.address)
if import_lib is not None:
lib_name = import_lib[0].name
if lib_name.endswith(".dll"):
lib_name = lib_name[:-4]
elif lib_name.endswith(".so"):
lib_name = lib_name[:-3]

for name in capa.features.extractors.helpers.generate_symbols(lib_name, sym_name):
yield API(name), ih.address

if sym_name.startswith("_"):
for name in capa.features.extractors.helpers.generate_symbols(lib_name, sym_name[1:]):
yield API(name), ih.address


def extract_insn_number_features(
fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle
Expand Down
2 changes: 1 addition & 1 deletion capa/features/extractors/elf.py
Original file line number Diff line number Diff line change
Expand Up @@ -898,7 +898,7 @@ def guess_os_from_symtab(elf: ELF) -> Optional[OS]:

def detect_elf_os(f) -> str:
"""
f: type Union[BinaryIO, IDAIO]
f: type Union[BinaryIO, IDAIO, GHIDRAIO]
"""
try:
elf = ELF(f)
Expand Down
Empty file.
Loading

0 comments on commit 182a986

Please sign in to comment.