RegEx for matching GET request in Apache2 access logs

57 Views Asked by At

I want to extract the URL inside a GET request found in the Apache2 access logs.

This is my code:

import re

x = "192.168.1.137 - - [07/Oct/2020:00:46:13 +0800] "GET /index1.php?command=CON4,0088888 HTTP/1.1" 200 454 "-" "Mozilla/5.0 (Windows NT 10.0; Win 64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Edg/75.0.564.63""

url = re.search("\/index1\.php\?command=....\,.....", x)
if url:
   print(url.group())
else:
   print("No match found")

When I run this code, it tells me no match found, is there something wrong with my RegEx? I am new to Regex so I would really appreciate some help. The exact URL that I want to get is : /index1.php?command=CON4,0088888

2

There are 2 best solutions below

1
On BEST ANSWER

The regex you have used does not allow for any variance in your check. As you are using . which will match one of any char. If the length were to change, your regex check would not be accurate.


If all uri will be index.php you can use

\/index.php([^\s]+)
  • \/index.php - Will find /index.php exactly
  • ([^\s]+) - Will match all chars until the first white space

However if this will change, you can use the following to match any uri of varying length.

(?<=GET )\/(.*).php([^\s]+)
  • (?<=GET ) - will do a positive lookbehind to confirm GET exists prior to our uri
  • \/(.*).php([^\s]+) - will match any uri that ends in .php regardless of length, and then all chars until first white space.

import re

x = '192.168.1.137 - - [07/Oct/2020:00:46:13 +0800] "GET /index1.php?command=CON4,0088888 HTTP/1.1" 200 454 "-" "Mozilla/5.0 (Windows NT 10.0; Win 64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Edg/75.0.564.63"'

url = re.search(r'(?<=GET )\/(.*).php([^\s]+)', x)
if url:
   print(url.group())
else:
   print("No match found")

Outputs:

#/index1.php?command=CON4,0088888
0
On
import re

x = '192.168.1.137 - - [07/Oct/2020:00:46:13 +0800] "PUT /index.php?command=CON4,0088888 HTTP/1.1" 200 454 "-" "Mozilla/5.0 (Windows NT 10.0; Win 64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Edg/75.0.564.63"'

url = re.search('.*\s(.*)\sHTTP', x)
if url:
   print(url.group(1))
else:
   print("No match found")

output:

/index.php?command=CON4,0088888