Captcha cookies

30 Views Asked by At

I am trying to scrape this webpage using Python and the requests library.

I go to the page, solve the captcha manually, get the cookies, and put it on the request connection. But, after some requests, the captcha reappears and the program stops.

I think the cookies should not expire that fast. Any thoughts on how to use the cookies to scrape this type of page?

Here is an example of p value for request: 0500548192019805008801000119

This is a shorter version of the code I am trying. What I do is loop different values of p.

cookie = 'portalbnmp=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJndWVzdF9wb3J0YWxibm1wIiwiYXV0aCI6IlJPTEVfQU5PTllNT1VTIiwiZXhwIjoxNzEwNjY2NDIxfQ.OA2voTGmab-PUk5Zn0zDnVJfxAlOmsxyRVmyjEinj_bS9Zr8DYxcjrPHpFGUUdkOd-_et2AFEwyxwj7VN6Eobw'

request_headers_short =   {
    'accept':'application/json',
    'accept-encoding': 'gzip, deflate, br, zstd',
    'accept-language': 'pt-PT,pt;q=0.9,en-US;q=0.8,en;q=0.7',
    'origin': 'https://portalbnmp.cnj.jus.br',
    'referer':'https://portalbnmp.cnj.jus.br/',
    'content-type':'application/json;charset=UTF-8',
    'cookie': cookie,
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
    }

request_url_short = 'https://portalbnmp.cnj.jus.br/bnmpportal/api/pesquisa-pecas/filter?page=0&size=10&sort='

payload = {"buscaOrgaoRecursivo": "false", 
           "numeroPeca": p, 
           "orgaoExpeditor": {}}

resp_short = requests.post(url = request_url_short, 
                           headers = request_headers_short, 
                           data = json.dumps(payload))
0

There are 0 best solutions below