Getting 403 error while accessing "https://www.sec.gov/" from AWS even after specifying user agent

435 Views Asked by At

Response from SEC.GOV for GET request:

Your request has been identified as part of a network of automated tools outside of the acceptable policy and will be managed until action is taken to declare your traffic.Please declare your traffic by updating your user agent to include company specific information.

I'm getting 403 error even after adding user-agent in get request. I'm able to access sec.gov from local and Azure cloud without any issues. This is happening only in AWS since last 4-5 days. Not sure why? Any help appreciated!

Here's what I'm doing:

import requests

url_1 = 'https://www.sec.gov'
url_2 = 'https://www.sec.gov/Archives/edgar/data/0001781258/000178125821000028/0001781258-21-000028-index.html'

HEADERS = {'User-Agent': 'TEST'}

# Both of the below get requests gives 403 error
print(requests.get(url_1, headers=HEADERS))
print(requests.get(url_2, headers=HEADERS))

1

There are 1 best solutions below

0
On

Check this link: https://www.sec.gov/os/webmaster-faq#user-agent

Here I am using PHP & CURL and resolved it with the following code:

$curl_headers   = array(    'User-Agent: MyDomainName.com [email protected]',
                            'Accept-Encoding: gzip, deflate',
                            'Host: www.sec.gov');
$ch             =   curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $curl_headers);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "MyDomainName.com [email protected]");
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_URL, $url);
$html           =   curl_exec($ch);    
curl_close($ch);