Compression without dictionary

503 Views Asked by SomewhatInterested At 11 May 2022 at 09:37

I have been testing the various compression algorithms with parquet files, and have settled on Zstd.

Now as far as I understand Zstd uses adaptive dictionary unless one is explicitly specified, thus it begins with an empty one. However when having a dictionary enabled the compressed size and and the execution time are quite unsatisfactory.

The file size without using a dictionary is quite less compared to using the adaptive one. (The number at the end of the name is the compression level):

Name: C:\ParquetFiles\Zstd1 Execution time: 279 ms Size: 13738134
Name: C:\ParquetFiles\Zstd2 Execution time: 140 ms Size: 13207017
Name: C:\ParquetFiles\Zstd9 Execution time: 511 ms Size: 12701030

And for comparison the log from using the adaptive dictionary:

Name: C:\ParquetFiles\ZstdDictZstd1 Execution time: 487 ms Size: 19462825
Name: C:\ParquetFiles\ZstdDictZstd2 Execution time: 402 ms Size: 19292513
Name: C:\ParquetFiles\ZstdDictZstd9 Execution time: 614 ms Size: 19072779

Can you help me understand the significance of this, shouldn't the output with an empty dictionary perform at least as good as Zstd compression with dictionary disabled?

Original Q&A

Compression without dictionary

There are 0 best solutions below

Related Questions in DICTIONARY

Related Questions in COMPRESSION

Related Questions in PARQUET

Related Questions in ZSTD

Trending Questions

Popular # Hahtags

Popular Questions