-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update regex_suffix.c #1095
Update regex_suffix.c #1095
Conversation
described in #1085
@@ -246,7 +246,7 @@ static uint8_t *parse_char_class(const uint8_t *pat, size_t patSize, size_t *pos | |||
for (c = range_start + 1; c <= range_end; c++) | |||
bitmap[c >> 3] ^= 1 << (c & 0x7); | |||
hasprev = 0; | |||
} else if (pat[*pos] == '[' && pat[*pos] == ':') { | |||
} else if (pat[*pos] == '[' && pat[*pos + 1] == ':') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if this is the correct fix, or if it was supposed to be ||
instead of &&
. Were you able to identify based on your understanding of regex character classes what this is trying to do and which would be the correct fix? I'm hesitant to make either change without knowing what was originally intended. Ideally we'd be able to test the fix also makes something work that previously wouldn't work. E.g. if there is some regex pattern that wouldn't work before and would now work after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making suggestions based on static code review so you are right to question them.
Searching for "[:" makes sense because there are predefined character classes in POSIX ( https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions#Character_classes ).
If using a regex with one of those character classes works with the fix but not without, that would show that the fix is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller of this function is here in parse_regex()
:
case '[':
++*last;
right = make_charclass(parse_char_class(p, pSize, last));
if (!right) {
destroy_tree(v);
return NULL;
}
v = make_node(concat, v, right);
if (!v) {
destroy_tree(right);
return NULL;
}
++*last;
break;
So it has already found the leading [
. So it would make sense if this were looking for something like [[:lower:]]
Note this regex parsing code is only used in the phishing signature regex parsing code for PDB and WDB signatures.
To test this, I made this PDB signature:
H:bankofamerica.com
to monitor the display domain name "bankofamerica.com" found in emails for suspicious real-URL's.
Then I made this WDB signature to test the regex features:
X:.+\.eba[[:digit:]]\.com:.+\.bankofamerica\.com
This silly signature will allow the real-URL to be eba5.com
(or any other digit after eba
), but anything other than that (such as ebay.com
, or ebat.com
) will not be ignored and will alert as suspicious.
I tested by scanning this email, with variations of the link:
From: "Somebody -X (username - Contractor at Example)"
<[email protected]>
To: "Your Name (Your Name)" <[email protected]>
Subject: RE: Your Money
Date: Fri, 12 Jun 2013 07:04:52 +0000
Content-Language: en-US
Content-Type: multipart/alternative;
boundary="_blah_"
MIME-Version: 1.0
--_blah_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=252">
</head>
<body>
<br>
Please click this link:
<br>
<a href="http://www.eba5.com">https://www.bankofamerica.com</a>
<br>
</body>
</html>
--_blah_--
I tested ClamAV 1.2.0 to see if using the [[:digit:]]
character class works.
E.g. I ran commands like this:
./install/bin/clamscan -d ../test.pdb -d ../test.wdb ../test.eml
Loading: 0s, ETA: 0s [========================>] 1/1 sigs
Compiling: 0s, ETA: 0s [========================>] 10/10 tasks
LibClamAV info: Suspicious link found!
LibClamAV info: Real URL: http://www.ebay.com
LibClamAV info: Display URL: https://www.bankofamerica.com
/home/micah/workspace/clamav-micah-2/test.eml: Heuristics.Phishing.Email.SSL-Spoof FOUND
It appears to work already, though I didn't debug step through line by line to see how it works. But I find that strange, because I did test in a debugger with this PR and was able to confirm it working, and also running the "new" code:
In short, it's not clear to me if this code is really needed. Though it does seem to work okay with your change. But it also works okay like this:
hasprev = 0;
// } else if (pat[*pos] == '[' && pat[*pos + 1] == ':') {
// /* char class */
// FREE(bitmap);
// while (pat[*pos] != ']') INC_POS(pos, patSize);
// INC_POS(pos, patSize);
// while (pat[*pos] != ']') INC_POS(pos, patSize);
// return dot_bitmap;
} else {
bitmap[pat[*pos] >> 3] ^= 1 << (pat[*pos] & 0x7);
As a side note: testing character classes relating to upper/lower character changes will not work as expected because the text is all converted to lowercase before matching.
described in #1085