create parquet file with Clickhouse and read with DuckDB

469 Views Asked by At

Following this guide https://clickhouse.com/docs/knowledgebase/mysql-to-parquet-csv-json I've exported from a MySQL server some tables to parquet.

But I'm not able to read these parquet files with DuckDB.

I can inspect the structure:

DESCRIBE SELECT * FROM 'mytable.parquet';

but if I try to read:

select ID from mytable.parquet;
Error: Invalid Error: Unsupported compression codec "7". Supported options are uncompressed, gzip, snappy or zstd

I guess that clickhouse is writing LZ4 compressed parquet files, and duckdb doesn't support them. Can I change the compression format in clickhouse-local?

2

There are 2 best solutions below

0
On BEST ANSWER

To change Parquet compression method in ClickHouse, use setting output_format_parquet_compression_method (see all Parquet settings in https://clickhouse.com/docs/en/sql-reference/formats#parquet-format-settings).

For example:

select ... format Parquet settings output_format_parquet_compression_method='snappy'
0
On

--output_format_parquet_compression_method Compression method for Parquet output format. Supported codecs: snappy, lz4, brotli, zstd, gzip, none (uncompressed)

try output_format_parquet_compression_method='snappy'

clickhouse-client -q "select * from numbers(1e6)  settings
 output_format_parquet_compression_method='snappy' format Parquet " > test.parquet