POSIX ERE Regex - Creating Efficient Regex

157 Views Asked by At

I'm working to create some regex entries that are well-formed, and efficient. I'll place an emphasis on efficient, as these regex entries can see thousands of logs per second. Inefficient regex entries can cause severe performance impacts.

Question: Does regex101 (through one flavor) support POSIX ERE Regex? Googling shows that PCRE2 should support BRE+ERE and more.

Regex Type: POSIX ERE

Syslog App: rsyslog (EL7)

Sample Payload (Well formed - Sensitive Information Stripped):

Jul 10 00:00:00 Firewall-Name-Removed CEF:0|Fortinet|FortiGate-removed|1.2.3,build1111 (GA)|0000000013|forward traffic accept|5|start=Jul 10 2022 00:00:00 logver=604091966 deviceExternalId=FG9A9A9A9999999 dvchost=Firewall-Name-Removed ad.vd=root ad.eventtime=1111111111111111111 ad.tz=-9999 ad.logid=0000000013 cat=traffic ad.subtype=forward deviceSeverity=notice src=1.1.1.1 shost=RandomHost1 spt=62119 deviceInboundInterface=DII-Out ad.srcintfrole=lan ad.srcssid=SSID Has Been Removed ad.apsn=ABC123D ad.ap=CHL-07 ad.channel=157 ad.radioband=802.11ac n-only ad.signal=-40 ad.snr=55 dst=2.2.2.2 dpt=53 deviceOutboundInterface=DOI-Out ad.dstintfrole=undefined ad.srccountry=Reserved ad.dstcountry=CountryRemoved externalID=123456789 proto=00 act=accept ad.policyid=000 ad.policytype=policy ad.poluuid=UUID-Removed ad.policyname=policy_name_removed app=DNS ad.trandisp=noop ad.appid=16195 ad.app=DNS ad.appcat=Network.Service ad.apprisk=elevated ad.applist=UTM Name - Removed ad.duration=180 out=0 in=205 ad.sentpkt=0 ad.rcvdpkt=1 ad.utmaction=allow ad.countdns=1 ad.osname=Windows ad.srcswversion=10 ad.mastersrcmac=MAC removed ad.srcmac=MAC removed ad.srcserver=0 tz="-9999"

What I'm attempting to do is remove specific logs that are not required. Normally I'd do this at a SIEM level through something like routing rules (where I can utilize fields), but this isn't possible for the foreseeable future. In this particular case: I'm trying to exclude on the following pieces of information.

Source IP: Is in a specific range

deviceOutboundInterface: is DOI-Out

Current Regex: "\bsrc=1.1.1[4-5]{0,1}.[0-9]{0,3}\b.*?\bdeviceOutboundInterface=DOI-Out\b" (Regex101 link in PCRE2). If that is matched, the log is rejected (through the stop call). Otherwise, it moves onto the other entries to check for unnecessary logs.

Most of my regex entries are in the low double-digits because they're a lot simpler. Is there a better way to make the more complex regex more efficient?

Thank you for any insight you can offer.

2

There are 2 best solutions below

3
Sam On

You might be able to cut some time with:

src=1\.1\.1[4-5]{0,1}\.[0-9]{0,3}.*?deviceOutboundInterface=DOI-Out

changes:

  • remove word boundaries
  • change the . to . in IP address

regex101 has the original efficiency at 383 steps, new is 301 so a potential savings of ~21%. Not terrible but you'll want to make sure any removals were OK.

to be honest, what you have looks pretty good to me.

0
user1016274 On

This RE reduces the number of steps on Reg101 from 383 to 270 (~ -29.5%):

src=1\.1\.1[45]?\.\d{0,3}.*?O[boundIter]*?face=DOI-Out

The original RE already is quite simple, only matching one pattern and one literal string which makes it difficult to optimize. But we can do if we know (from the documentation of the text in question, here the Log Message manual) that an even simpler pattern will not lead to ambiguities.

Changes:

  • matching literal text whereever possible
  • replacing range '4-5' with simple elements
  • instead of matching the long 'deviceOutboundInterface=', use a pattern which will just barely match this string but would possibly match other words if they ever occurred in log messages - but we know they don't.