Compression without dictionary

490 Views Asked by At

I have been testing the various compression algorithms with parquet files, and have settled on Zstd.

Now as far as I understand Zstd uses adaptive dictionary unless one is explicitly specified, thus it begins with an empty one. However when having a dictionary enabled the compressed size and and the execution time are quite unsatisfactory.

enter image description here

The file size without using a dictionary is quite less compared to using the adaptive one. (The number at the end of the name is the compression level):

  • Name: C:\ParquetFiles\Zstd1 Execution time: 279 ms Size: 13738134
  • Name: C:\ParquetFiles\Zstd2 Execution time: 140 ms Size: 13207017
  • Name: C:\ParquetFiles\Zstd9 Execution time: 511 ms Size: 12701030

And for comparison the log from using the adaptive dictionary:

  • Name: C:\ParquetFiles\ZstdDictZstd1 Execution time: 487 ms Size: 19462825
  • Name: C:\ParquetFiles\ZstdDictZstd2 Execution time: 402 ms Size: 19292513
  • Name: C:\ParquetFiles\ZstdDictZstd9 Execution time: 614 ms Size: 19072779

Can you help me understand the significance of this, shouldn't the output with an empty dictionary perform at least as good as Zstd compression with dictionary disabled?

0

There are 0 best solutions below