I have daily TAQ data for a month. I am trying to unzip those using SAS but I am failing. The folder's name is EQY_US_ALL_TRADE_202107. It has several zipped (GZ files) files for each trading day named as EQY_US_ALL_TRADE_202210701 EQY_US_ALL_TRADE_202210702 EQY_US_ALL_TRADE_202210703 ... EQY_US_ALL_TRADE_202210729
I have tried the following code. First I tried to unzip two files (hence, in line 4, do n = 1 to 2). It is not working at all.
'data "D:\EQY_US_ALL_TRADES_202107\MainDataset";
rc=filename("folderef","D:\EQY_US_ALL_TRADES_202107");
did = dopen("folderef");
do _n_ = 1 to 2;
filename = dread(did,_n_);
if scan(filename,-1,'.') ne 'gz' then continue;
fullname = pathname("folderef") || '/' || filename;
do while(1);
infile archive zip filevar=fullname gzip dlm='|' firstobs=2 eof=nextfile;
OUTPUT;
end;
nextfile:
end;
stop;
run;
Proc contents data = "D:\EQY_US_ALL_TRADES_202107\MainDataset";
run;'
So you have three problems.
The primary one is understanding how to read ONE of the files. If you downloaded this from NYSE then they should be pipe delimited text files and the variable definitions are published. So first work on code that can read one of the files.
To read a pipe delimited text file just use a simple data step. So say perhaps you have the daily quotes file. The documentation says that file has 23 variables. Reading delimited files is simple. Just define the variables and the input them. Make sure to remove the summary line at the bottom.
The second problem is how to get the list of files to be read.
To get the list of files from a directory is a common question here and on SAS Communities. Your current code is close to doing that using the DOPEN() and DREAD() functions.
Once you have solved those two problems you can then move onto how to read ALL of the files into one dataset. That you could do by driving the data step that reads the TAQ files with the data that has the list of files. You can use the FILEVAR= option of the INFILE statement to do that. So if you have dataset named FILES with a variable named FULLNAME that has the name of the GZIP files you want to read the basic structure would look like this: