-
Notifications
You must be signed in to change notification settings - Fork 716
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add code comments to explain AC pattern prefix process
When adding a pattern to the AC trie, checks are done to make sure the bytes that go in the AC trie don't have any `?` wildcards and additionally that the first two bytes are not "\x00\x00". If they are, the position of the pattern that goes in the AC trie can be shifted right until a static pattern is identified that can go in the AC trie. Any bytes to the left of the new start of the pattern become a "prefix". During matching, once the AC trie match occurs and the bytes to the right of that pattern are matched, then the bytes from the prefix are matched. The reason that we don't want the bytes that go in the AC trie to start with "\x00\x00" is that it is such a common pattern in files that it would match constantly, and the scan process would spend a lot of time just checking through the list of patterns associated with a "\x00\x00" AC match, and that'd be crazy slow. But it is important to note that when shifting right, if there aren't enough nonzero, non-wildcard bytes to form a good prefix for the AC trie, that it is tolerable to bend the rule and let some patterns start with "\x00\x00". In that way, a small pattern like "0000ab" is still valid, and can be matched.
- Loading branch information
1 parent
fdf23d5
commit 7488787
Showing
1 changed file
with
39 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters