Webscraping PythonRequests can't find generated request-id needed for header

33 Views Asked by At

I can scrape this url for json content successfully but the request-id in the get request expires quickly. I want to know how I get the generated request-id from this website? This works without cookies but just needs a fresh request-id from the below requests.get.

headers = {
    'authority': 'www.jewelosco.com',
    'accept': 'application/json, text/plain, */*',
    'accept-language': 'en-US,en;q=0.9',
    'dnt': '1',
    'ocp-apim-subscription-key': '5e790236c84e46338f4290aa1050cdd4',
    'referer': 'https://www.jewelosco.com/shop/search-results.html?q=fig%20bars',
    'sec-ch-ua': '"Chromium";v="116", "Not)A;Brand";v="24", "Brave";v="116"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"macOS"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'sec-gpc': '1',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
}

response = requests.get(
    f'https://www.jewelosco.com/abs/pub/xapi/pgmsearch/v1/search/products?request-id=2561697127627216628&url=https://www.jewelosco.com&pageurl=https://www.jewelosco.com&pagename=search&rows=30&start=0&search-type=keyword&storeid=3455&featured=true&search-uid=uid%253D6696364499362%253Av%253D12.0%253Ats%253D1696961074175%253Ahc%253D12&q=fig%20bars&sort=&featuredsessionid=&screenwidth=149&dvid=web-4.1search&channel=pickup&banner=jewelosco',
    headers=headers,
)
response.json()

So far I've been table to get the 169712 from the 2561697127627216628 as a server time code from another response but that still leaves 13 numbers I don't know about. The server code requests is below:

headers = {
    'authority': 'albertsons.inq.com',
    'accept': '*/*',
    'accept-language': 'en-US,en;q=0.9',
    'content-type': 'application/x-www-form-urlencoded',
    'origin': 'https://chat.jewelosco.com',
    'referer': 'https://chat.jewelosco.com/',
    'sec-ch-ua': '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"macOS"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'cross-site',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
}

data = {
    'checksum': '1640249083',
    '_rand': 'lrvnwar',
    'rid': 'r310947',
    'd': '{"INQ":{"siteID":10006484,"custID":"-6153494727971827858","scheduleTZs":{}}}',
}

response = requests.post('https://albertsons.inq.com/tagserver/init/initFramework', headers=headers, data=data)

r = response.json()['INQ']['serverTime'][0:6]
r # this gets the middle 6 elements of the request-id 169703 which is some server time function and is the second chunk

I'd prefer to do this with just requests and not loading js content.

Any help would be greatly appreciated.

I've tried randomly generating the other 13 numbers around the server code which didn't work. I also tried looking for request=id on and didn't find anything that matched the number.

0

There are 0 best solutions below