I am looking for help webscraping the SEC's EDGAR database using BeautifulSoup. I have a list of investment firm names that I am trying to iterate through, and ultimately access their 13F filings.
So far, using BeautifulSoup, I am able to specify an entry, but am having trouble finding a way to put together the SEC's base web url with a specific file to actually access the data.
My code so far looks like:
headers = {"user-agent": 'Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0'}
for i in firms: # pre-determined list, but using IFP Advisors for this example as 'i'
edgar_url = r'https://www.sec.gov/cgi-bin/srch-edgar?text=form-type%3D13F-HR+and+company-name+%3D+%22' + i + '%22&first=2020&last=2021&output=atom'
response = requests.get(url = edgar_url, headers = headers)
soup = BeautifulSoup(response.content, 'lxml')
entries = soup.find_all('entry')
which gets me to a list of specific 13F filing entries.
<entry>
<title>13F-HR - IFP Advisors, Inc</title>
<link rel="alternate" type="text/html" href="/Archives/edgar/data/1641866/000164186621000007/0001641866-21-000001-index.htm"/>
<summary type="html"><b>Filed Date:</b> 01/25/2021 <b>Accession Number:</b> 0001641866-21-000001 <b>Size:</b> 4 MB</summary>
<updated>01/25/2021</updated>
<category scheme="http://www.sec.gov/" label="form type" term="4"/>
<id>urn:tag:sec.gov,2008:accession-number=0001641866-21-000001</id>
</entry>
Eventually, what I would be looking to do is pull out the href dictated above
/Archives/edgar/data/1641866/000164186621000007/0001641866-21-000007-index
and pair it with the scheme in the entry to access the 13F filing's text file, which can be found here: https://www.sec.gov/Archives/edgar/data/1641866/000164186620000007/0001641866-20-000007.txt
While I have the scheme designated, I am looking for a solution to pulling in the link href from each entry to create a new url to access more data.
Any help or suggestions would be appreciated. Thank you in advance!
To get URLs for complete submissions you can use this example:
Prints: