I'm trying my first steps in ML using Jupter's IPython, I was advised to start with Nasdaq's order book ITCH dataset to create models. I'm following the same steps in this tutorial on github.
I can't seem to unzip/expand files from the ITCH dataset, when executing the function may_be_download(url)
and the following code (code cell nr.5 in tutorial):
file_name = may_be_download(urljoin(FTP_URL, SOURCE_FILE))
date = file_name.name.split('.')[0]
I get the following error; EOFError: Compressed file ended before the end-of-stream marker was reached
Nor am I able to simply unzip the file by clicking on it in Finder or using gzip
& gunzip
methods in Terminal.
I took the following steps:
- Executed all previous code cells (1-4)
- Copied the file
03272019.NASDAQ_ITCH50.gz
to a folder nameddata
in the relative path- First I went clicked on the sample link in the notebook
- Then logged in as a guest and navigated to the folder
Nasdaq ITCH
- Then located the file
03272019.NASDAQ_ITCH50.gz
and copyed it a local folder.
- Executed code cell nr.5 listed above.
I've search and tried numerous solutions to similar issues listed here on Stack and Github, but none seem to solve this particular problem. I would deeply appreciate any help and thoughts on what may be occurring and how I might go about solving this.
I'll leave you with a picture of the error logs, assuming it may be of some help
Thanks for reading.
I downloaded that file and one other from that site. They both appear to be corrupted, both failing with incomplete deflate data.
What's more, there are MD5 signatures for the files there, and what is downloaded has MD5 signatures that do not match.
This is not being caused by the ftp server doing end-of-line conversions, because the lengths of the file in bytes match exactly the lengths on the server. Also a histogram of the byte values shows no bias.