AWS Lambda - Python - ModuleNotFoundError: No module named 'pandas'

1.1k Views Asked by At

I'm running into errors, no matter how many different configurations I try.

Activating Conda env - wfl
Conda env, wfl, activated
Setting up paths
/app/python
/opt/conda/lib/python3.9
/opt/conda/lib/python39.zip
/opt/conda/lib/python3.9/lib-dynload
/opt/conda/lib/python3.9/site-packages
/app/python/domain
/app/python/persistence
/app/python/processors
/app/python/sharedpython
/app/python/tools
/app/python/tests
Traceback (most recent call last):
File "/app/python/lambdas/test.py", line 3, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
Activating Conda env - wfl

I've got an AWS Lambda function, python, that points to a docker image in ECR. The app uses conda which adds a little complexity as well. (the full requirements.txt includes fbprophet which needs conda). To activate the conda env, I need entrypoint to be a wrapper shell script, docker-entrypoint.sh. Below is obviously a dumbed-down version, but should contain enough information to glean insight from. When running this locally or executing the docker container directly, the script runs without issues, it's not until it's executing within the context of an AWS Lambda when it's throwing these errors... Is there something with the path/python path I'm missing? Any help would be greatly appreciated!

Note: I'm a novice with Dockerfile/commands/optimizing layers/images as you might tell... any insight to slim down the image would be helpful as well! (and a little novice to python/conda environments/management.....bleh!)

python/lambdas/test.py

import json
import time
import pandas as pd

def lambda_handler(event, context):
    print("event: " + json.dumps(event, indent=2))
    print("context: " + json.dumps(context, indent=2))
    x = pd.DataFrame()
    print(x)

print('test.py')
x = pd.DataFrame()
print(x)
time.sleep(200)  # used to allow time to search file contents of container

Dockerfile

# Base Image
####
FROM ubuntu:20.04 AS core
# Upgrade installed packages
RUN apt update -y  &&  apt upgrade -y && \
    apt install -y --no-install-recommends python3.8 unixodbc gnupg2 curl ca-certificates python3-distutils unixodbc-dev && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1

# Add SQL ODBC Driver 17 for Ubuntu 20.04 -- https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15#ubuntu17
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
    curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
    apt update && \
    ACCEPT_EULA=Y apt install -y --no-install-recommends --allow-unauthenticated msodbcsql17 && \
    ACCEPT_EULA=Y apt install -y --no-install-recommends --allow-unauthenticated mssql-tools && \
    echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile && \
    echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc

RUN apt clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
    apt remove -y --purge curl gnupg2

# Set entrypoint
ENTRYPOINT [ "/bin/bash" ]


####
# Developer / Compiler Image
####
FROM core AS dev
# Upgrade installed packages
RUN apt update -y  &&  apt upgrade -y && \
    apt install -y --no-install-recommends curl python3.8-dev git python3-pip unixodbc-dev build-essential

# Install miniconda
RUN apt install wget
ENV CONDA_DIR /opt/conda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
    /bin/bash ~/miniconda.sh -b -p /opt/conda
# Put conda in path so we can use conda activate
ENV PATH=$CONDA_DIR/bin:$PATH

RUN conda init bash && \
    conda create --name wfl && \
    activate wfl

# Install Python dependencies
COPY requirements.txt .

# Install pip and dependencies
RUN python3 -m pip install -U pip setuptools wheel && \
    python3 -m pip install --user -r requirements.txt && \
    rm -f requirements.txt

RUN apt clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
    apt remove -y --purge git curl python3.8-dev python3-pip unixodbc-dev build-essential

# Set entrypoint
ENTRYPOINT [ "/bin/bash" ]


####
# Runtime Image
####

# This is the third and final image; it copies the compiled
# binary over but starts from the core ubuntu:20.04 image.
FROM core AS runtime

COPY --from=dev /root/.local /root/.local
# Make sure scripts in .local are usable:
ENV PATH=/root/.local/bin:$PATH
ENV CONDA_DIR /opt/conda
ENV PYTHON_APP_DIR /app/python
# copy conda to be available for env activation
COPY --from=dev $CONDA_DIR $CONDA_DIR
ENV PATH=$CONDA_DIR"/bin:$PATH"
RUN pip install pandas

# cleanup any non application files
RUN rm -rf ./python/tests
# Load application files
COPY ./python $PYTHON_APP_DIR
COPY ./docker-entrypoint.sh $PYTHON_APP_DIR

# Set work directory
WORKDIR $PYTHON_APP_DIR
ENV PYTHONPATH $PYTHON_APP_DIR

# Set entrypoint
ENTRYPOINT ["/app/python/docker-entrypoint.sh"]
CMD ["lambdas/test.py"]

setup_paths.py

import os
import sys

PROJECTS_ROOT = os.path.dirname(os.path.abspath(__file__))

def add_package_to_sys_path(base, package_relative_path):
    package_path = os.path.join(base, package_relative_path)
    if package_path not in sys.path:
        sys.path.append(package_path)


add_package_to_sys_path(PROJECTS_ROOT, 'lambdas')
add_package_to_sys_path(PROJECTS_ROOT, 'domain')
add_package_to_sys_path(PROJECTS_ROOT, 'persistence')
add_package_to_sys_path(PROJECTS_ROOT, 'processors')
add_package_to_sys_path(PROJECTS_ROOT, 'sharedpython')
add_package_to_sys_path(PROJECTS_ROOT, 'tools')
add_package_to_sys_path(PROJECTS_ROOT, 'tests')

for p in sys.path: print(p)

docker-entrypoint.sh

#!/bin/bash
set -e

echo 'Activating Conda env - wfl'
activate wfl
echo 'Conda env, wfl, activated'

python setup_paths.py
python "$@"

requirements.txt

pandas==1.2.4

Project structure

1

There are 1 best solutions below

1
On

There are 2 possibilities A. You might not activate your virtual environment so please make sure
B. You have not installed pandas so write pip install pandas