How can I compress many files in a solid archive, but quickly extract only one?

296 Views Asked by At

I have 14,000 25 MB files that I am trying to compress to minimum size for storage. At runtime, I will only need to decompress one or two of the files. There is sufficient intra-file redundancy that they compress reasonably well, but there is also sufficient inter-file redundancy that taring first doubles the compression ratio:

Individual files, compressed with "xz -9": 65 GB total
Single tar blob, compressed with "xz -9": 33 GB

Is there a way to compress a set of files in such a way that the compressor can take advantage of inter-file redundancy, but the decompressor does not need to decompress all data? Extracting a 33GB tar at runtime would be untenable. I can use a compressor library API directly, but would prefer not to heavily modify the library itself.

I tried using zstd, training a dictionary from the full file set, the compressing with that dictionary, but it showed no improvement in compression ratio whatsoever (zstd dictionaries seem to only help with very small files).

0

There are 0 best solutions below