SEC Edgar filings extraction master.idx question

427 Views Asked by At

I encountered an issue while using the code from https://codingandfun.com/scraping-sec-edgar-python/

I tried to contact the authors of the website, but didn't work out. I am hoping to get some help here, and thank you in advance.

It seems that when I get to the print (download) step, the output is some weird special characters instead of organized firm urls. Is there something wrong the SEC master.idx? Could someone help me identify the issue?

Here is the code:

import bs4 as bs
import requests
import pandas as pd
import re

company = 'Facebook Inc'
filing = '10-Q'
year = 2020
quarter = 'QTR3'
#get name of all filings 
download = requests.get(f'https://www.sec.gov/Archives/edgar/full-index/{year}/{quarter}/master.idx').content
download = download.decode("utf-8").split('\n')
print (download) 
1

There are 1 best solutions below

0
On

You need to declare your user-agent as described here otherwise you will download an html page prompting you do so.