Trouble installing an R package: fatal error: libxml/globals.h: No such file or directory

163 Views Asked by At

I am analyzing a large data set in a Jupyter notebook with IRkernel. Until now, this all worked like a charm; however, for my next step, I had to install the clusterProfiler package from BiocManager, and that's where the problems began. The installation of clusterProfiler always has a "non-zero exit status".

As I dockerized the JupyterLab environment, the issue should, to my understanding, be reproducible, so let's look at my setup:

Setup

File structure

Jupyter-R/
├── jupyter-r/
│   ├── docker-compose.yml
│   └── Dockerfile
└── notebooks/

docker-compose.yml

version: '3'

services:
  jupyter:
    build:
      context: ..
      dockerfile: ./jupyter-r/Dockerfile
    ports:
      - "8888:8888"
    volumes:
      - ../notebooks:/app/notebooks  # Mount a local folder to store notebooks
    environment:
      - TZ=UTC+1  # Set the container timezone
    image: jupyter_r
    container_name: jupyter_r

Dockerfile (I removed some R packages that I am sure to be unrelated to the issue)

# Use the jupyter/r-notebook image as the base
FROM quay.io/jupyter/r-notebook

# Set the working directory to /app
WORKDIR /app

# Switch to the root user
USER root

# Install system dependencies
RUN apt-get update
RUN apt-get install -y --fix-missing \
    libssl-dev \
    libcurl4-openssl-dev \
    libxml2-dev \
    glpk-utils \
    libfontconfig1-dev \
    libharfbuzz-dev \
    libfribidi-dev \
    libproj-dev \
    libglpk-dev \
    libfreetype6-dev \
    libpng-dev \
    libtiff5-dev \
    libjpeg-dev \
    r-cran-igraph \
    mono-mcs \
    mono-xbuild \
    mono-runtime \
    libmono-system-data4.0-cil

# Install R packages
RUN mamba install --yes \
    'r-BiocManager'\

# Install BiocManager packages
RUN R -e "BiocManager::install(c(\
    'clusterProfiler',\
    'AnnotationDbi',\
    'org.Hs.eg.db'\
))"

# Expose the Jupyter Notebook port
EXPOSE 8888

# Create JupyterLab configuration directory
RUN mkdir -p /etc/jupyter

# Add JupyterLab configuration to save notebooks in /app/notebooks
RUN echo "c.NotebookApp.notebook_dir = '/app/notebooks'" >> /etc/jupyter/jupyter_lab_config.py
RUN echo "c.MappingKernelManager.kernel_cmd_timeout = 3600" >> /home/jovyan/.jupyter/jupyter_notebook_config.py

# Command to run Jupyter Notebook
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]

As you can see, my Dockerfile is based on quay.io/jupyter/r-notebook of the IPython team, which is based on Ubuntu 22.04 (jammy).

Issue

When running docker-compose up --build, everything looks good. The docker image is built, and the container can be started. However, if you call installed.packages(), you will see that the required clusterProfiler package is missing.

I removed both the container and the image and rebuilt the image with the docker-compose --progress="plain" build command. This allows for debugging the installation process as the output is kept. When searching for clusterProfiler , I see the following:

#9 418.7 Warning messages:
#9 418.7 1: In install.packages(...) :
#9 418.7   installation of package ‘igraph’ had non-zero exit status
#9 418.7 2: In install.packages(...) :
#9 418.7   installation of package ‘tidygraph’ had non-zero exit status
#9 418.7 3: In install.packages(...) :
#9 418.7   installation of package ‘ggraph’ had non-zero exit status
#9 418.7 4: In install.packages(...) :
#9 418.7   installation of package ‘enrichplot’ had non-zero exit status
#9 418.7 5: In install.packages(...) :
#9 418.7   installation of package ‘clusterProfiler’ had non-zero exit status
#9 DONE 419.4s

Well, okay, that's a lot. Luckily, all those failed installations are related to each other. clusterProfiler requires enrichplot, which requires ggraph, and so on. Good. So, to resolve the issue, I have to fix the failing installation of igraph.

When searching for igraph in the terminal output, I came across the following:

#9 174.1 vendor/cigraph/src/io/graphml.c:46:10: fatal error: libxml/globals.h: No such file or directory
#9 174.1    46 | #include <libxml/globals.h>
#9 174.1       |          ^~~~~~~~~~~~~~~~~~
#9 174.1 compilation terminated.
#9 174.1 make: *** [/opt/conda/lib/R/etc/Makeconf:193: vendor/cigraph/src/io/graphml.o] Error 1
#9 174.1 ERROR: compilation failed for package ‘igraph’
#9 174.1 * removing ‘/opt/conda/lib/R/library/igraph’
#9 174.1 * restoring previous ‘/opt/conda/lib/R/library/igraph’

As it seems, igraph cannot be compiled because libxml/globals.h is missing. This is unexpected, as I installed libxml2-dev, so I double-checked:

  • Running dpkg -l libxml2 confirms, that the package is installed
  • In the Docker container, I opened opt/conda/include/libxml2/libxml, and sure enough, I could find the globals.h in that folder

What is happening here? I am happy for every tiniest bit of help.

1

There are 1 best solutions below

2
datawookie On

docker-compose.yml

version: '3'

services:
  jupyter:
    build:
      context: ..
      dockerfile: ./jupyter-r/Dockerfile
    ports:
      - "8888:8888"
    volumes:
      - ../notebooks:/app/notebooks  # Mount a local folder to store notebooks
    environment:
      - TZ=UTC+1  # Set the container timezone
    image: jupyter_r
    container_name: jupyter_r

Dockerfile

Key changes are:

  • use conda to install glpk and gmp;
  • create link from /opt/conda/include/libxml2/libxml to /opt/conda/include/ to get around a funky -I directive to the C compiler ( This was the critical step to fixing the problem!); and
  • install {BiocManager} and {igraph} directly using R.
FROM quay.io/jupyter/r-notebook

USER root

RUN apt-get update
RUN apt-get install -y --fix-missing \
        libssl-dev \
        libcurl4-openssl-dev \
        libxml2-dev \
        glpk-utils \
        libfontconfig1-dev \
        libharfbuzz-dev \
        libfribidi-dev \
        libproj-dev \
        libfreetype6-dev \
        libpng-dev \
        libtiff5-dev \
        libjpeg-dev \
        mono-mcs \
        mono-xbuild \
        mono-runtime \
        libmono-system-data4.0-cil && \
    conda install -y -c conda-forge glpk gmp && \
    ln -s /opt/conda/include/libxml2/libxml /opt/conda/include/

RUN echo 'options(repos = c(CRAN = "https://cloud.r-project.org"))' >>/opt/conda/lib/R/etc/Rprofile.site

RUN R -e 'install.packages(c("BiocManager", "igraph"))'
RUN R -e 'BiocManager::install(c("clusterProfiler", "AnnotationDbi", "org.Hs.eg.db"), ask=FALSE, update=TRUE)'

EXPOSE 8888

RUN mkdir -p /etc/jupyter

RUN echo "c.NotebookApp.notebook_dir = '/app/notebooks'" >> /etc/jupyter/jupyter_lab_config.py
RUN echo "c.MappingKernelManager.kernel_cmd_timeout = 3600" >> /home/jovyan/.jupyter/jupyter_notebook_config.py

CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]

enter image description here