I'm attempting to run a Tabular Iceberg table locally within a Docker container. I've configured my Dockerfile and Spark environment, but I'm encountering issues executing the setup. Below are the details of my configuration and the commands I've executed:
The table that I am trying to access is tabular_iceberg.foundation_raw.mytablename_table
Details: I have a Dockerfile configured as follows:
FROM amazon/aws-glue-libs:glue_libs_3.0.0_image_01
# Setting user ID and group ID
ARG USER_ID
ENV USER_ID ${USER_ID:-1001}
ARG GROUP_ID
ENV GROUP_ID ${GROUP_ID:-1001}
# Switching to root user to perform setup
USER root
RUN groupmod -g $GROUP_ID root
RUN usermod -u $USER_ID glue_user
RUN chown -R glue_user:root /tmp/spark-events
RUN chown -R glue_user:root /root
# Switching back to glue_user
USER glue_user
RUN chmod 755 /tmp/spark-events
RUN chmod 755 /root
# Creating directories and copying requirements
RUN mkdir -p /home/glue_user/code
ARG USER_HOME=/home/glue_user/code
COPY ./requirements.txt ${USER_HOME}/requirements.txt
COPY ./requirements-dev.txt ${USER_HOME}/requirements-dev.txt
# Installing Python dependencies
WORKDIR ${USER_HOME}
RUN pip3 install --upgrade pip \
&& pip3 install -r requirements-dev.txt \
&& pip3 install -r requirements.txt
# Setting environment variables
ENV PATH "/home/glue_user/.local/bin:${PATH}"
# Entrypoint
ENTRYPOINT [""]
I've executed the following commands to build and run the Docker container:
>docker build -t test:latest .
>docker run --rm -it \
-v $(pwd):/usr/local/user \
-v $HOME/.aws/credentials:/home/glue_user/.aws/credentials:rw \
-e SPARK_PUBLIC_DNS="localhost" \
-e AWS_REGION=eu-west-1 \
-p 4040:4040 \
-w /usr/local/user \
-e DISABLE_SSL="true" \
test:latest \
/bin/bash -c \
'export PATH=/home/spark-3.4.0-bin-spark-3.4.0-bin-hadoop2.8/bin:$PATH; \
export PYTHONPATH=/usr/local/user; \
bash'
Additionally, my Spark configuration includes the following settings:
"spark.sql.catalog.tabular_iceberg": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.tabular_iceberg.catalog-impl": "org.apache.iceberg.rest.RESTCatalog",
"spark.sql.catalog.tabular_iceberg.uri": "https://iam-gw.eu-west-1.tabular.io/ws",
"spark.sql.catalog.tabular_iceberg.rest.sigv4-enabled": "true",
"spark.sql.catalog.tabular_iceberg.warehouse": "prod"
Below is my error
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.connector.catalog.ViewCatalog
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
I also attempted to use the Tabular docker-compose.yaml found here, but it appears to be stuck at the below point
spark-iceberg | [I 2024-03-14 10:08:41.076 ServerApp] jupyter_lsp | extension was successfully loaded.
spark-iceberg | [I 2024-03-14 10:08:41.079 ServerApp] jupyter_server_terminals | extension was successfully loaded.
spark-iceberg | [I 2024-03-14 10:08:41.110 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.9/site-packages/jupyterlab
spark-iceberg | [I 2024-03-14 10:08:41.110 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
spark-iceberg | [I 2024-03-14 10:08:41.113 LabApp] Extension Manager is 'pypi'.
spark-iceberg | [I 2024-03-14 10:08:41.173 ServerApp] jupyterlab | extension was successfully loaded.
spark-iceberg | [I 2024-03-14 10:08:41.178 ServerApp] notebook | extension was successfully loaded.
spark-iceberg | [I 2024-03-14 10:08:41.182 ServerApp] Serving notebooks from local directory: /home/iceberg/notebooks
spark-iceberg | [I 2024-03-14 10:08:41.182 ServerApp] Jupyter Server 2.13.0 is running at:
spark-iceberg | [I 2024-03-14 10:08:41.182 ServerApp] http://localhost:8888/tree
spark-iceberg | [I 2024-03-14 10:08:41.182 ServerApp] http://127.0.0.1:8888/tree
spark-iceberg | [I 2024-03-14 10:08:41.182 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
spark-iceberg | [I 2024-03-14 10:08:41.321 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server