Snort/PCRE Regex: odd character class syntax

1.1k Views Asked by At

While I was parsing the Snort regex set I found a very odd character class syntax, like [\x80-t] or [\x01-t\x0B\x0C\x0E-t\x80-t], and I can't figure out (really no clue) what -t means. I don't even know if it's standard PCRE or a sort of Snort extension.

Here are some regular expression that contains these character classes:

/\x3d\x00\x12\x00..........(.[\x80-t]|...[\x80-t])/smiR
/^To\x3A[^\r\n]+[\x01-t\x0B\x0C\x0E-t\x80-t]/smi

PS: please note that \x80-t is not even a valid range in the standard way because character t is \x74.

2

There are 2 best solutions below

6
On

This could reference a different character encoding where t is larger than x80 and x80 can't be addressed normally.

Take EBCDIC Scan codes for example (see here for a reference).

(But I too have no clue why somebody would want to write it that way)

For ASCII I have a wild guess: If -t means "until the next token -1" or if placed last in line "until the end of allowed characters" the second query would state this:

To:(not a newline, more than one character)(not a newline)

So basically the expression [\x01-t\x0B\x0C\x0E-t\x80-t] would mean [^\r\n].

If one applies that to (.Ç-t]|...[Ç-t]) that would address any character larger than 7bit ASCII which also could address all of unicode (besides the first 127 characters).

(That being said, I still have no clue why somebody should write it like this, but at least thats a coherent explanation besides "Its a bug")

Maybe helpful: What does the rexexes you posted mean if one writes out the \xYY? ASCII:

/=\NULL\DEVICE_CONTROL_2\NULL\.{10}\(.Ç-t]|...[Ç-t])/smiR
/^To\:[^\r\n]+[\START_OF_HEADING-t\VERTICALTAB\FORMFEED\SHIFTOUT\Ç-t]/smi

Looking after the \0x12 aka Device control 2 could help, because that won't show up in text, but maybe in net traffic.

0
On

The second regex matches lines that begin with To: (case-insensitive) followed by at least one character that isn't a line feed or carriage return. Since this is a greedy match, I'd expect \r or \n to be the only possible terminating matches in the [\x01-t\x0B\x0C\x0E-t\x80-t] character class. Note: \r is equivalent to \x0D and \n is equivalent to \x0A. Not sure what -t means but let's pretend it was - instead. Then the character class would be [\x01-\x0B\x0C\x0E-\x80-], which is still a bit convoluted but would make a little bit more sense - i.e. allowing a \n as a terminating character but not \r.

This is a very long shot but is there any chance this could be some kind of search-and-replace gone wrong?! (Guess this can probably be quickly discounted if there are other regexes that have normal ranges without the t.)