Requests - determine parameterised url prior to issuing request, for inclusion in Referer header

1.7k Views Asked by At

I am writing a Python 2.7 script using Requests to automate access to a particular website. The website has a requirement that a Referer header matching the request URL is provided, for "security reasons". The URL is built up from a number of items in a params dict, passed to requests.post().

Is there a way to determine what the URL that Requests will use is, prior to making the request, so that the Referer header can be set to this correct value? Let's assume that I have a lot of parameters:

params = { 'param1' : value1, 'param2' : value2, # ... etc
  }

base_url = "http://example.com"
headers = { 'Referer' : url }   # but what is 'url' to be?
requests.post(base_url, params=params, headers=headers)  # fails as Referer does not match final url

I suppose one workaround is to issue the request and see what the URL is, after the fact. However there are two problems with this - 1. it adds significant overhead to the execution time of the script, as there will be a lot of such requests, and 2. it's not actually a useful workaround because the server actually redirects the request to another URL, so reading it afterwards doesn't give the correct Referer value.

I'd like to note that I have this script working with urllib/urllib2, and I am attempting to write it with Requests to see whether it is possible and perhaps simpler. It's not a complicated process the script has to follow, but it may perhaps be slightly beyond the scope of Requests. That's fine, I'd just like to confirm that this is the case.

2

There are 2 best solutions below

0
On BEST ANSWER

I think I found a solution, based on Prepared Requests. The concept is that Session.prepare_request() will do everything to prepare the request except send it, which allows my script to then read the prepared request's url, which now includes the parameters whose order are determined by the dict order. Then it can set the Referer header appropriately and then issue the original request.

params = {'param1' : value1, 'param2' : value2, # ... etc
         }
url = "http://example.com"

# Referer must be correct
# To determine correct Referer url, prepare a request without actually sending it
req = requests.Request('POST', url, params=params)
prepped = session.prepare_request(req)
#r = session.send(prepped)   # don't actually send it

# add the Referer header by examining the prepared url
headers = { 'Referer': prepped.url }

# now send normally
r = session.post(url, params=params, data=data, headers=headers)
5
On

it looks like you've found correctly the prepare_request feature in Requests.

However, if you still wanted to use your initial method, I believe you could use your base_url as your Referer:

base_url = "http://example.com"
headers = { 'Referer' : base_url }
requests.post(base_url, params=params, headers=headers)

I suspect this because your POST has the PARAMS directly attached to the base_url. If, for example, you were on:

http://www.example.com/trying-to-send-upload/

adding some params to this POST, you would then use:

referer = "http://www.example.com/trying-to-send-something/"
headers = { 'Referer' : referer, 'Host' : 'example.com' }
requests.post(referer, params=params, headers=headers)

ADDED

I would check my URL visually by using a simple statement after you've created the URL string:

print(post_url)

If this is good, you should print out the details of the reply from the server you're posting to, as it might also give you some hints as to why your query was rejected:

s = requests.post(referer, params=params, headers=headers)
print(s.status_code)
print(s.text)

Love to hear if this works as well for you.