How can I import external python libraries in python shell AWS Glue job

4k Views Asked by At

I have been trying to import an external python libraries in aws glue python shell job.

  1. I have uploaded the whl file for Pyodbc in s3.
  2. I referenced the s3 path in "python library path" in additional properties of Glue job.
  3. I also tried to give job parameter --extra-py-files with value as s3 path of whl file.
  4. whenever I write the line "from pyodbc import pyodbc as db"or just "import pyodbc" it always returns "ModuleNotFoundError: No module named 'pyodbc'"
  5. Logs are shown as below:

Processing ./glue-python-libs-cq4p0rs8/pyodbc-4.0.32-cp310-cp310-win_amd64.whl Installing collected packages: pyodbc Successfully installed pyodbc-4.0.32

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

File "/tmp/glue-python-scripts-g_mt5xzp/Glue-ETL-Dev.py", line 2, in ModuleNotFoundError: No module named 'pyodbc'

I am downloading the wheel files from here :https://pypi.org/project/pyodbc/#files

No matter how many versions of whl files I refer in the glue job, it always throws the same error.

can anyone enlighten me where it's going wrong?

1

There are 1 best solutions below

3
On

I have tried to follow these guides [1], [2] in the official documentation of AWS, but I was facing some issues when importing some libraries, such as psycopg2. Finally, I managed to import the desired libraries by following the steps of this tutorial from AWS blog [3]. The blog is in Spanish, but maybe you can manage to translate it.

Basically what they do is create a setup.py script on which they define the required libraries. Afterwards, they generate a .whl file with those libraries and they upload that file to a s3 bucket from which the Glue Python Shell script gets the required libraries.

[1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job

[2] https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library

[3] https://aws.amazon.com/es/blogs/aws-spanish/usando-python-shell-y-pandas-en-aws-glue-para-procesar-conjuntos-de-datos-pequenos-y-medianos/