How to get list of files in an archive file without download

353 Views Asked by At

I try to get list of file names in large archive files (zip, 7z, tar, rar etc.) located in remote server. I avoid to download files due to network cost.

An alternative is to use an HTTP range request (1, 2, 3); however, each archive file type has a unique allocation for the entire central directory. Apache commons-compress library supports most of them, I tend to use it to overcome this issue. How can I use it for remote archive files without download?

As with the python libraries (1, 2), do you have any advice for Java?

1

There are 1 best solutions below

1
On

If you can't run something on the server side, then you can do a range request on just the end of the zip file, and reconstruct a zip file locally with no contents on which you can use unzip to list the contents. You would write zeros for the content.

I just tried zeroing out everything before the central directory on a large zip file, and unzip listed the contents just fine.

To do this you could either a) search for the end of central directory, and then possibly the zip64 end record locator and zip64 end record, in order determine the offset of the central directory, reading from there, or b) read larger and larger portions of the end of the zip file, say doubling each time, until unzip -l works. If you have not captured the entire central directory, then unzip -l will report "start of central directory not found".

To use range requests, you will need to know the size of the zip file. Then for b), you can read, say, the last 1K, the 1K before that, the 2K before that, the 4K before that, and so on, until unzip -l works. Each time you would update a file with zeros up to what you have accumulated from the end of the zip file so far, followed by what you have accumulated. To do this efficiently, you would start with file of all zeros with the length of the zip file on the server. Then as you accumulate more data from the end, write over the end of that file, repeating unzip -l each time.

If you want to try a), then you'll need to read and understand the zip file format appnote.