I am trying to scrape this webpage using Python and the requests library.
I go to the page, solve the captcha manually, get the cookies, and put it on the request connection. But, after some requests, the captcha reappears and the program stops.
I think the cookies should not expire that fast. Any thoughts on how to use the cookies to scrape this type of page?
Here is an example of p value for request: 0500548192019805008801000119
This is a shorter version of the code I am trying. What I do is loop different values of p.
cookie = 'portalbnmp=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJndWVzdF9wb3J0YWxibm1wIiwiYXV0aCI6IlJPTEVfQU5PTllNT1VTIiwiZXhwIjoxNzEwNjY2NDIxfQ.OA2voTGmab-PUk5Zn0zDnVJfxAlOmsxyRVmyjEinj_bS9Zr8DYxcjrPHpFGUUdkOd-_et2AFEwyxwj7VN6Eobw'
request_headers_short = {
'accept':'application/json',
'accept-encoding': 'gzip, deflate, br, zstd',
'accept-language': 'pt-PT,pt;q=0.9,en-US;q=0.8,en;q=0.7',
'origin': 'https://portalbnmp.cnj.jus.br',
'referer':'https://portalbnmp.cnj.jus.br/',
'content-type':'application/json;charset=UTF-8',
'cookie': cookie,
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
}
request_url_short = 'https://portalbnmp.cnj.jus.br/bnmpportal/api/pesquisa-pecas/filter?page=0&size=10&sort='
payload = {"buscaOrgaoRecursivo": "false",
"numeroPeca": p,
"orgaoExpeditor": {}}
resp_short = requests.post(url = request_url_short,
headers = request_headers_short,
data = json.dumps(payload))