I have created an EFS file system and mounted it to an EC2 instance so that I can add a few imports (numpy, scipy, sklearn, etc). I was able to sudo pip3 install these into the mounted /efs/ directory in EC2 with no issues. I have mounted the same EFS to my lambda function in /mnt/efs/. However, when I try to import sklearn, I get the following error:
Response
{
"errorMessage": "Unable to import module 'lambda_function': No module named 'sklearn.__check_build._check_build'\\n___\_______________________________________________________________________\_\\nContents of /mnt/efs/sklearn/_check_build:\\nsetup.py __pycache__ check_build.cpython-37m-x86_64-linux-gnu.so\\n__init_.py\\n_________________________________________________________________________\_\\nIt seems that scikit-learn has not been built correctly.\\n\\nIf you have installed scikit-learn from source, please do not forget\\nto build the package before using it: run `python setup.py install` or\\n`make` in the source directory.\\n\\nIf you have used an installer, please check that it is suited for your\\nPython version, your operating system and your platform.",
"errorType": "Runtime.ImportModuleError",
"stackTrace": \[\]
}
I am not really sure what the issue is because all my other packages import just fine. In fact, I am actually able to import sklearn just fine within the EC2 instance. For reference, this is my lambda function code:
import boto3
import io
import sys
import json
import warnings
sys.path.append("/mnt/efs/")
import numpy
import scipy
import sklearn //this line is the one that is failing
def lambda_handler(event, context):
// lambda code here
In my /efs/ mounted directory in EC2, I have the following directories made after running sudo pip3 install --upgrade --target=. scikit-learn:
bin joblib numpy.libs scikit_learn.libs sklearn
Cython joblib-1.3.2.dist-info pycache scipy test
Cython-3.0.8.dist-info numpy pyximport scipy-1.7.3.dist-info threadpoolctl-3.1.0.dist-info
cython.py numpy-1.21.6.dist-info scikit_learn-1.0.2.dist-info scipy.libs threadpoolctl.py
I initially tried to listen to the advice the error message gave me and ranpython setup.py install within /efs/sklearn/ but all I got was the following error:
$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 4, in <module>
from sklearn._build_utils import cythonize_extensions
ImportError: No module named sklearn._build_utils
Then, I tried to delete and pip3 reinstall everything in hopes it would just fix itself (it did not). Then, I tried to add the --upgrade tag so it would reinstall anything it needed to and that still did not work.
I then double checked the VPC and security groups and everything seems to be correct. I have an EFS-securitygroup that only allows SSH access from the EC2-securitygroup, which both the EC2 instance and the lambda function are a member of so they can access the EFS.
I was wondering if maybe my python version could be an issue so I checked that. My EC2 version is:
[ec2-user efs]$ python --version
Python 2.7.18
[ec2-user efs]$ python3 --version
Python 3.7.16
And my lambda function is running on Python 3.8. I figured that 3.7.16 and 3.8 should be close enough for there to be no issues.
I really have no idea what the issue is because all my other imports work just fine; it is only sklearn that is failing when I try to import it from the EFS and into my lambda function.