#15 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
#15 google-cloud-aiplatform 1.16.1 requires google-cloud-bigquery<3.0.0dev,>=1.15.0, but you have google-cloud-bigquery 3.10.0 which is incompatible.
#15 google-ads 18.0.0 requires protobuf!=3.18.*,!=3.19.*,<=3.20.0,>=3.12.0, but you have protobuf 3.20.3 which is incompatible.
We are receiving these errors in the logs of docker-compose build
when building our apache airflow image. According to LLM model:
- The first conflict is between google-cloud-aiplatform and google-cloud-bigquery. The google-cloud-aiplatform library requires a version of google-cloud-bigquery that is less than 3.0.0dev and greater than or equal to 1.15.0, but you have google-cloud-bigquery version 3.10.0 installed which is incompatible.
- The second conflict is between google-ads and protobuf. The google-ads library requires a version of protobuf that is less than or equal to 3.20.0 and greater than or equal to 3.12.0, excluding versions 3.18.* and 3.19.*, but you have protobuf version 3.20.3 installed which is incompatible.
It's worth noting that dbt-bigquery==1.5.0
is a new release from only a few weeks ago.
Here is our Dockerfile:
FROM --platform=linux/amd64 apache/airflow:2.5.3
# install mongodb-org-tools
USER root
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
apt-get update && apt-get install -y mongodb-org-tools
USER airflow
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
and our requirements.txt
gcsfs==0.6.1 # Google Cloud Storage file system interface
ndjson==0.3.1 # Newline delimited JSON parsing and serialization
pymongo==3.12.1 # MongoDB driver for Python
dbt-bigquery==1.5.0 # dbt adapter for Google BigQuery
numpy==1.21.1 # Numerical computing in Python
pandas==1.3.1 # Data manipulation and analysis library
billiard # Multiprocessing replacement, to avoid "daemonic processes are not allowed to have children" error using Pool
How can we resolve these dependency conflicts? How can we even tell which library dependencies are for which libraries in our requirements.txt? My assumption is that google-cloud-aiplatform
and google-cloud-bigquery
are both dependencies of dbt-bigquery
, however if they were dependencies to the same library, I wouldn't except a dependency conflict.
Edit: some useful logs from the build:
Requirement already satisfied: protobuf>=3.18.3 in /home/airflow/.local/lib/python3.7/site-packages (from dbt-core~=1.5.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (3.20.0)
Collecting google-cloud-bigquery~=3.0
Downloading google_cloud_bigquery-3.10.0-py2.py3-none-any.whl (218 kB)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.15.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.19.6)
Requirement already satisfied: grpcio<2.0dev,>=1.47.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.53.0)
Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.4.1)
Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.3.2)
Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.8.2)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.56.4)
Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.48.2)
Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5))
google-cloud-aiplatform
and google-ads
do not appear a single time in the build logs other than in the error message.
The problem arises from conflicts with Python packages the OS requests to install and the dependency graph of your project's packages.
The short answer is to use the same strategy as you often would with any Python project: venv
Solution
Below is a complete working
Dockerfile
:Note the setup an use of
venv
here. Just like outside a container, this will partition your application dependencies from the system-installed one inside the container.Notes
In this sample I have used
root
user as the permissions issue was getting annoying. In your production file you'll want to use COPY chown... and put things in place with appropriateUSER
permissions./usr/local/app/
is just my paradigm. You can put the files anywhere.Because we are rewriting the
$PATH
instead of usingactivate
for the venv, you have to tell pip--no-user
.At first trying
--no-install-recommends
inapt-get install
to see if the affected dependency would be excluded. However, I left it in there as it's good practice and minimize your image size.Detail
When running
apt-get install
you can see a number of packages are installed:I didn't track down the exact problem package, but you can see several
python3-*
packages requested to be installed. One of these conflicts with the dependency graph of your application.