I want to extract the URL inside a GET request found in the Apache2 access logs.
This is my code:
import re
x = "192.168.1.137 - - [07/Oct/2020:00:46:13 +0800] "GET /index1.php?command=CON4,0088888 HTTP/1.1" 200 454 "-" "Mozilla/5.0 (Windows NT 10.0; Win 64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Edg/75.0.564.63""
url = re.search("\/index1\.php\?command=....\,.....", x)
if url:
print(url.group())
else:
print("No match found")
When I run this code, it tells me no match found, is there something wrong with my RegEx? I am new to Regex so I would really appreciate some help. The exact URL that I want to get is : /index1.php?command=CON4,0088888
The
regex
you have used does not allow for any variance in your check. As you are using.
which will match one of any char. If the length were to change, your regex check would not be accurate.If all
uri
will beindex.php
you can use\/index.php
- Will find/index.php
exactly([^\s]+)
- Will match all chars until the first white spaceHowever if this will change, you can use the following to match any
uri
of varying length.(?<=GET )
- will do a positive lookbehind to confirmGET
exists prior to oururi
\/(.*).php([^\s]+)
- will match anyuri
that ends in.php
regardless of length, and then all chars until first white space.Outputs: