Grok/Oniguruma pattern to match first IP from X-Forwarded-For header

3.7k Views Asked by At

For this issue I'm trying to create a grok pattern, which matches the first IP from the X-Forwarded-For header in a nginx log. A log line typically looks like this:

68.75.44.178, 172.68.146.54, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /jobs/24237/it-back-end HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The first IP is the the clients actual IP, which is the one I want to retreive, the other two come from proxies, in our case cloudflare and varnish.

My pattern, which I tried on https://grokconstructor.appspot.com looks like this:

FIRSTIPORHOST (^%{IPORHOST})(?:,\s%{IPORHOST})*

Unfortunally it matches all IPs, despite the non capturing group, so what am I doing wrong? Or is there a better pattern?

Clarification:

One to read the whole log file into elastic search using filebeats, I therefore need to somehow match IPs, otherwise I won't be able to match the rest of the line, like the date or user agent and so on.

2

There are 2 best solutions below

6
On BEST ANSWER

You need to add the (?:,\s[\d.]+)* after the %{IPORHOST:nginx.access.remote_ip} at the start of the pattern. See the fixed expression:

"%{IPORHOST:nginx.access.remote_ip}(?:,\\s[\\d.]+)* - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\""

The (?:,\s[\d.]+)* non-capturing repeated group matches 0+ occurrences of:

  • , - a comma
  • \s - a whitespace
  • [\d.]+ - 1+ digits or commas.

This way, no additional data can be captured.

1
On

Given filter did not worked for me during my x_forwarder_for greeping but solution mentioned on another page worked https://serverfault.com/questions/725186/grok-issue-with-multiple-ips-in-nginx-logstash