How to authorize a request properly using requests and mechanicalsoup (webscrape)

368 Views Asked by At

so my goal is to read the content of a table from a web page with the python library mechanicalsoup

my problem is that I am not able to authorize my requests properly

resulting in <Response [403]>

example website:https://opensea.io/rankings?sortBy=one_day_volume

>>> import mechanicalsoup
>>> browser = mechanicalsoup.StatefulBrowser()
>>> browser.open("https://opensea.io/rankings?sortBy=one_day_volume")
<Response [403]>

in the browser.open(url).content i can see that i get restricted by their policies

The access policies of a site define which visits are allowed. Your current visit is not allowed according to those policies.Only the site owner can change site access policies.

I think it has something to do with cookies or other parameters of my requests

thats why i tried passing cookies without success

import mechanicalsoup
import pandas as pd 
import sqlite3
import requests


response = requests.get('https://opensea.io/rankings?sortBy=one_day_volume')

responsecookies = response.cookies

print(response.headers)
# {'Date': 'Fri, 15 Apr 2022 15:50:21 GMT', 'Content-Type': 'text/html;
# charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive',
# 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin',
# 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate,
# post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT',
# 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
# 'Set-Cookie': '__cf_bm=tumOZlW184DvFKZxjESM4RmsFtWcSWCsULENv42SGPE-1650037821-0-AbEEcLHRXFdyj8qFXj3yD6tHIjU0MLj5Sjq8dITWab+S7w8kxgOW38ZODJqp9mwl3WuLK+ub4Yu1W3kxcvX2C3Q=;
# path=/; expires=Fri, 15-Apr-22 16:20:21 GMT; domain=.opensea.io; HttpOnly; Secure;
# SameSite=None', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=0;
# includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'Server': 'cloudflare',
# 'CF-RAY': '6fc5d61fab7a9b9b-FRA', 'Content-Encoding': 'gzip'}

print(responsecookies)
#<RequestsCookieJar[<Cookie __cf_bm=tumOZlW184DvFKZxjESM4RmsFtWcSWCsULENv42SGPE-1650037821-0-AbEEcLHRXFdyj8qFXj3yD6tHIjU0MLj5Sjq8dITWab+S7w8kxgOW38ZODJqp9mwl3WuLK+ub4Yu1W3kxcvX2C3Q= for .opensea.io/>]>


browser = mechanicalsoup.StatefulBrowser()
browser.open("https://opensea.io/rankings?sortBy=one_day_volume", cookies=responsecookies)
# <Response [403]>

how do i best analyze which parameters my request must contain? & how do i pass them correct?

Thankyou for reading this

0

There are 0 best solutions below