How to read_csv a zstd-compressed file using python-polars

851 Views Asked by At

In contrast to pandas, polars doesn't natively support reading zstd compressed csv files.

How can I get polars to read a csv compressed file, for example using xopen?

I've tried this:

from xopen import xopen
import polars as pl

with xopen("data.csv.zst", "r") as f:
    d = pl.read_csv(f)

but this errors with:

pyo3_runtime.PanicException: Expecting to be able to downcast into bytes from read result.: 
   PyDowncastError
2

There are 2 best solutions below

0
On

One needs to xopen the file in binary mode "rb", then it works:

from xopen import xopen
import polars as pl

with xopen("data.csv.zst", "rb") as f:
    d = pl.read_csv(f)

Beware that the entire file will be read into memory before parsing, even if you immediately use only a subset of columns/rows.

1
On

polars doesn't natively support reading compressed csv files.

This is not really true. We support decompression for zlib and gzip. You can make a feature request for zstd, then we can look into supporting that as well.