custom pattern to drop bots when importing logs into influxdb

534 Views Asked by At

I import apache logs into influxdb with telegraf and logparser plugin

I want to filter out all the logs from bots, so I setup a custom pattern with a regex that only match user-agent that don't contain the words "bot" and "crawl" :

NOBOT ((?!bot|crawl).)*
CUSTOM_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-) %{QS:referrer} "%{NOBOT:agent}"

but it doesnt work, zero metrics are being imported into influxdb

the regex seems ok and it works fine when I test it here : http://grokconstructor.appspot.com/do/match

Just to be sure I tried a simpler regex :

BOT .*?bot.*?
CUSTOM_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-) %{QS:referrer} "%{BOT:agent}"

and it works, telegraf only import logs from bots but I want the opposite, I don't see what's wrong with ((?!bot|crawl).)*

1

There are 1 best solutions below

1
On BEST ANSWER

I'm not sure why you are not getting an error message, but unfortunately Go doesn't support negative look-aheads:

https://play.golang.org/p/Kq5N2FgG6_

Hello, playground
panic: regexp: Compile(`((?!bot|crawl).)*`): error parsing regexp: invalid or unsupported Perl syntax: `(?!`

goroutine 1 [running]:
panic(0x133400, 0x1050a140)
    /usr/local/go/src/runtime/panic.go:500 +0x720
regexp.MustCompile(0x149f1b, 0x11, 0x1, 0xb)
    /usr/local/go/src/regexp/regexp.go:237 +0x1a0
main.main()
    /tmp/sandbox426143344/main.go:10 +0xe0

I'd recommend opening an issue on the github repo to return an error message in these cases.

As for making the match you're trying to do, this might be helpful: Negative Look Ahead Go regular expressions