Nasdaq ITCH order book build -> EOF error when expanding '.gz' file

275 Views Asked by At

I'm trying my first steps in ML using Jupter's IPython, I was advised to start with Nasdaq's order book ITCH dataset to create models. I'm following the same steps in this tutorial on github.

I can't seem to unzip/expand files from the ITCH dataset, when executing the function may_be_download(url) and the following code (code cell nr.5 in tutorial):

file_name = may_be_download(urljoin(FTP_URL, SOURCE_FILE))
date = file_name.name.split('.')[0]

I get the following error; EOFError: Compressed file ended before the end-of-stream marker was reached

Nor am I able to simply unzip the file by clicking on it in Finder or using gzip & gunzip methods in Terminal.

I took the following steps:

  • Executed all previous code cells (1-4)
  • Copied the file 03272019.NASDAQ_ITCH50.gz to a folder named data in the relative path
    • First I went clicked on the sample link in the notebook
    • Then logged in as a guest and navigated to the folder Nasdaq ITCH
    • Then located the file 03272019.NASDAQ_ITCH50.gz and copyed it a local folder.
  • Executed code cell nr.5 listed above.

I've search and tried numerous solutions to similar issues listed here on Stack and Github, but none seem to solve this particular problem. I would deeply appreciate any help and thoughts on what may be occurring and how I might go about solving this.

I'll leave you with a picture of the error logs, assuming it may be of some help

enter image description here

Thanks for reading.

1

There are 1 best solutions below

0
On

I downloaded that file and one other from that site. They both appear to be corrupted, both failing with incomplete deflate data.

What's more, there are MD5 signatures for the files there, and what is downloaded has MD5 signatures that do not match.

This is not being caused by the ftp server doing end-of-line conversions, because the lengths of the file in bytes match exactly the lengths on the server. Also a histogram of the byte values shows no bias.