Replies: 4 comments 2 replies
-
First, I think this is a well-written proposal. I appreciate the various considerations that you've included here, and it definitely will spark a good discussion. Second, some related discussions:
Of note, our existing stance has been:
However, just cause we've been doing it one way for a while doesn't mean that we should never change our minds. As noted, I think we could experiment with an implementation fairly easily. These days, for me, the following concerns at the front of my mind. They're not necessarily relevant here, but might be:
I wonder if we could emulate some of yara's functionality but continue to use our own syntax. For example, could we support another
|
Beta Was this translation helpful? Give feedback.
-
It would really be awesome to express features such as:
Using yara is an interesting idea for this, although I'd prefer to extend capa accordingly. I think something like this would be cool: features:
- and:
- sequence: |
push 0xDEADBEEF
call X I know we'd have to change the engine quite a bit for this, so we need to take a closer look at all the benefits. |
Beta Was this translation helpful? Give feedback.
-
@mr-tz interesting idea. There are several benefits to your approach. @williballenthin thank you for the comments. I had not seen the Capa issues. One disadvantage to byte matching is rule authors may try to account for instruction encoding and register swapping. This can lead to either overly permissive regular expressions or overly restrictive ones (unless done right). A common example is having a Yara rule look for @mr-tz's idea would eliminate the need to account for instruction encoding, which could help lower the barrier to entry and prevent unintended byte matches. An extension to this idea could be: features:
- and:
- sequence:
- instruction: push 0xdeadbeef
- mnemonic: call
- api: foobar The
and
Since the characteristics only need to match in order, this could eliminate the need to worry about interlaced junk instructions. Whereas with Yara rules, byte regexes are forced to become more permissive and have optional wildcarded ranges. Byte pattern matching would still be useful in the (edge) cases where control flow is obfuscated. Jumping to the middle of the next instruction (for example) would not be easy to describe when the disassembly may become untrustworthy. Yara is the tool that comes to mind when referencing byte pattern matching. With that said, Yara itself is not as important as the idea. Capa already supports boolean logic, if adding support for byte patterns (or regular expressions) is a trivial addition then it would only make sense not to include an external dependency. @mr-tz's approach may supersede the original idea altogether. |
Beta Was this translation helpful? Give feedback.
-
I like this reasoning a lot. Furthermore, using the raw hex for instructions is difficult for a human to inspect - they need to trust any associated comments, and are unlikely to disassemble any hex string to figure out what instructions match. Therefore, I'd agree that it would be preferable to use assembly language strings to describe what we're looking for (@mr-tz idea). I think we can implement verbatim assembly matching fairly easily, using something like keystone. Assemble the string to bytes, then match on byte sequences, and good to go. However, I think we'd want at least some wildcarding support. That is, being able to say something like
I think this is probably a feasible project, though non-trivial. (sidebar: this would be a neat intern project - practical, challenging, many opportunities to learn). I can also see how this can be integrated into capa while maintaining fair performance (single up front scan for pattern matches, index results by VA, emit features from this index into the appropriate scopes. only scary part is the up front scans, but that can be done in native code-land, so its probably fast enough). @re-fox goes further:
This would provide ultimate flexibility, though as a counterpoint, would it be too much flexibility? In a scenario of |
Beta Was this translation helpful? Give feedback.
-
Description
There exist edge cases where it can be awkward to write a Capa rule to trigger on some binary functionality. Yara helps address these.
Currently Capa does not support:
push 0xf000
)Rather than generically scanning the whole file with Yara, I propose using Capa's knowledge of functions and basic blocks to limit scope and write rules that otherwise would not be possible.
Generic Example
Using the example from Idea - Location information · Discussion #393 · fireeye/capa · GitHub
The data passed to Yara would be the raw bytes that comprise the basic block. Yara makes this easy via the option in
yara-python
to pass a buffer rather than a file handle.Taking this one step further and pairing with an API would result in the following (naive) rule
The above would be looking for
Which would solve the issue of pairing the mnemonic with value(s). The example is not perfect, but when the constant may be more generic or the order of args are important, Yara would present a solution.
Implementation Considerations
Capa's concept of scope fits well.
file
- scan the whole file with Yarabasic block
andfunction
- when the search should be restricted.Strings could also be passed to Yara, but Capa already provides a reasonable (full-regex) support for these. Most use cases I've thought of have been scanning for byte sequences where instruction order is important and operator+operand pairing.
The logic to extract bytes for each BB/Function would have to exist down in each engine (more code), but it should be possible with most backend frameworks.
Drawbacks to including Yara
pip install
anymore.Beta Was this translation helpful? Give feedback.
All reactions