I am trying to parse and get information from XBRL files, and it seems there are a number of open source packages that have the ability to parse XBRL files in python. However, documentation on using them seems to be lacking. The packages I have investigated are:
- Arelle: https://arelle.readthedocs.io/en/2.25.2/
- Py-xbrl: https://py-xbrl.readthedocs.io/en/latest/usage.html#offline
- Brel*: https://brellibrary.github.io/brel/
*For reasons that I won't get into, I'm currently unable to use Brel (it's essentially due to requiring python >=3.10).
However, the other packages don't seem to allow to me to parse downloaded XBRL files offline. The structure of my dowloaded XBRL files is as follows (taking the example of Apple from the SEC filings):
aapl-20200926
|
+-- aapl-20200926.xsd
+-- aapl-20200926_cal.xml
+-- aapl-20200926_def.xml
+-- aapl-20200926_lab.xml
+-- aapl-20200926_pre.xml
+-- aapl-20200926_htm.xml
I'm aware this isn't the typical purpose of stackoverflow, but does anyone know how I could start parsing XBRL files in the above structure (rather than from a link on the internet) in a python script? I expect there is some way to do this using Arelle and the Python API, or in py-xbrl, but I haven't been able to crack it so far.
The easiest way to get started with Arelle is to download the complete ZIP of the filing from the SEC. Annoyingly, it's not directly linked from the page you linked to, but you find it by opening the iXBRL file, and going to Menu -> Save XBRL Zip file. Or you can just replace
-index.htmwith-xbrl.zip:https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/0000320193-20-000096-xbrl.zip
To get started, try this command line:
This should convert your downloaded file into xBRL-JSON format, saved as
aapl.json.The
validate/EFMplugin is needed for SEC filings, as they use some custom transforms.The
saveLoadableOIMenables the xBRL-JSON functionality.To get started with Python, here's a fairly minimal script that dumps out all facts in the report:
To process this, you will need to unzip the filing, and then feed it the
.htmfile:The
ModelFactobject has quite a few properties and methods. You might find this code from the Inline XBRL viewer plugin useful to see what's possible.