R Arrow finds no compress even when it is installed

103 Views Asked by At

Currently I am trying to generate zstd compressed Apache Parquet files in R in an docker container.

Even when I install all dependencies and arrow itself works fine it does not find the zstd (or brotli for that matter) compression. This is the MWE version of my Dockerfile:

FROM r-base

RUN apt-get update
RUN apt-get -y install --no-install-recommends \
     libcurl4-openssl-dev \
     libssl-dev \
     libxml2-dev \
     libgit2-dev \
     libgsl0-dev \
     libfontconfig1-dev \
     libharfbuzz-dev \
     libfribidi-dev \
     libpng-dev \
     libtiff5-dev \
     git \
     curl \
     build-essential \
     libboost-system-dev \
     libboost-thread-dev \
     libboost-program-options-dev \
     libboost-test-dev \
     libboost-filesystem-dev \
     libsnappy-dev \
     libthrift-dev \
     libutf8proc-dev \
     rapidjson-dev \
     libxsimd-dev \
     liblz4-dev \
     libre2-dev \
     cmake \
     zstd \
     brotli

ARG ARROW_R_DEV=true
RUN R -e 'install.packages(Ncpus = 64, pkgs = c("arrow"))'

When i start the container and test the availability I see:

> arrow::codec_is_available("snappy")
[1] TRUE
> arrow::codec_is_available("zstd")
[1] FALSE
> arrow::codec_is_available("brotli")
[1] FALSE

which showes me that arrow itself is working fine (I tested it too), but cant find zstd or brotli.

How can I write an zstd compressed Parquet file in R in an docker container?

2

There are 2 best solutions below

0
On BEST ANSWER

I had a deeper look in the documentation and found that its not only about the system packages beeing installed but, also about how you build arrow itself. Adding the NOT_CRAN=true environment variables sets (among other things) LIBARROW_MINIMAL to false which builds arrow with zstd brotli support.

This MWE Dockerfile works for me

FROM r-base

RUN apt-get update
RUN apt-get -y install --no-install-recommends \
     libcurl4-openssl-dev \
     libssl-dev \
     libxml2-dev \
     libgit2-dev \
     libgsl0-dev \
     libfontconfig1-dev \
     libharfbuzz-dev \
     libfribidi-dev \
     libpng-dev \
     libtiff5-dev \
     git \
     curl \
     build-essential \
     libboost-system-dev \
     libboost-thread-dev \
     libboost-program-options-dev \
     libboost-test-dev \
     libboost-filesystem-dev \
     libsnappy-dev \
     libthrift-dev \
     libutf8proc-dev \
     rapidjson-dev \
     libxsimd-dev \
     liblz4-dev \
     libre2-dev \
     cmake \
     zstd \
     brotli

ARG ARROW_R_DEV=true
ARG NOT_CRAN=true
RUN R -e 'install.packages(Ncpus = 64, pkgs = c("arrow"))'
0
On

I think you may need

  libzstd-dev \
  libbrotli-dev

Instead of

  zstd \
  brotli

...for the Arrow install to pick that up. If you update your question to include the full build output we might be able to help a little more! For example, I am surprised that the arrow package install does not download our pre-built static libraries (which include zstd and brotli to my knowledge).