-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add encode-data-using-add-xor-sub-operations.yml #800
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting idea. i'm also not sure if this is the right way to express the rule. can you imagine a different syntax or other feature that would let you express this more easily? could we introduce another characteristic that would enable this?
- and: | ||
- characteristic: tight loop | ||
- instruction: | ||
- mnemonic: xor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be replaced with characteristic(nzxor)
(though technically that also matches xor reg,reg
too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this rule I wanted to express the following logic: require ADD, SUB and XOR instructions to be present in a single basic block and require immediate value operand for ADD, SUB and XOR instructions to be within (0x1,0xff) range.
characteristic: nzxor
could work here but I chose to be as specific as possible - code I'm trying to match typically looks like this:
add dl, 4Fh
xor dl, 0F1h
sub dl, 4Fh
One thing that I think could make the rule a bit more cleaner would be support for ranges or list of values by the number
feature, so e.g.
- and:
- characteristic: tight loop
- instruction:
- mnemonic: xor
- operand[1].number: (0x1,0xff)
- instruction:
- mnemonic: add
- operand[1].number: (0x1,0xff)
- instruction:
- mnemonic: sub
- operand[1].number: (0x1,0xff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this work as an approximation?
- not:
- operand[1].number: 0
this would include larger values though, so may be too broad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All code samples I've seen use single byte values as ADD/XOR/SUB operands so any non-zero value might be too broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, ok, I'm not sure how to best proceed here then without changing the rule syntax (not worth it ATM IMHO) or just merging as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williballenthin are we worried about performance here? @jtothej could we add other features to quickly filter out likely FPs (large functions, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mr-tz i agree that its probably not worth changing the rule syntax for a single rule today. if we can come up with other places the syntax change is useful then lets reconsider.
@mr-tz i am somewhat worried about performance. i think this can be assessed by running the test cases before/after this PR and seeing if there's any meaningful delta. i wouldn't be surprised if my intuition is wrong about performance, so this shouldn't be a blocker without real data. sorry @jtothej for bogging down the PR merge with this worry.
@jtothej i think we should consider removing some common, small values from the ADD/SUB branches to account for iteration over an array, like bytewise processing of a string, wstring, etc. So maybe remove 1/2/4/8/16 from ADD/SUB branches. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a review of the test case matches (assume it works) and other matches in our test corpus (TODO), then a sanity check on performance impact, then im ok with this being merged.
- operand[1].number: 0x01 | ||
- operand[1].number: 0x02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- operand[1].number: 0x01 | |
- operand[1].number: 0x02 |
common string/wstring index deltas, remove to avoid FPs?
note this is based on intuition, so feel free to push back/discuss.
- operand[1].number: 0x01 | ||
- operand[1].number: 0x02 | ||
- operand[1].number: 0x03 | ||
- operand[1].number: 0x04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- operand[1].number: 0x04 |
common array index deltas, remove to avoid FPs?
note this is based on intuition, so feel free to push back/discuss.
- operand[1].number: 0x05 | ||
- operand[1].number: 0x06 | ||
- operand[1].number: 0x07 | ||
- operand[1].number: 0x08 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- operand[1].number: 0x08 |
common array index deltas, remove to avoid FPs?
note this is based on intuition, so feel free to push back/discuss.
- operand[1].number: 0x01 | ||
- operand[1].number: 0x02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- operand[1].number: 0x01 | |
- operand[1].number: 0x02 |
common string/wstring index deltas, remove to avoid FPs?
note this is based on intuition, so feel free to push back/discuss.
according to offline discussions, Willi's suggestions above or this approximation both work also features:
- and:
- count(basic blocks): 6 or fewer
- basic block:
- and:
- characteristic: tight loop
- characteristic: nzxor
- count(mnemonic(add)): 1
- count(mnemonic(sub)): 1 |
data-manipulation/encoding/encode-data-using-add-xor-sub-operations.yml
Outdated
Show resolved
Hide resolved
…tions.yml Co-authored-by: Moritz <[email protected]>
data-manipulation/encoding/encode-data-using-add-xor-sub-operations.yml
Outdated
Show resolved
Hide resolved
Thanks a lot! |
Add encode-data-using-add-xor-sub-operations.yml
The rule feels a bit brute force-y but I'm not sure if there's more elegant way to do it.