AWS --extra-py-files throwing ModuleNotFoundError: No module named 'pg8000'

722 Views Asked by At

I am trying to use pg8000 in my Glue Script, following are params in Glue Job

--extra-py-files    s3://mybucket/pg8000libs.zip  //NOTE: my zip contains __init__.py

Some Insights towards code

import sys
import os
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
from pyspark.sql import Row
from datetime import datetime, date

zip_path = os.path.join('/tmp', 'pg8000libs.zip')
sys.path.insert(0, zip_path)


def dump_python_path():
    print("python path:", sys.path)

    for path in sys.path:
        if os.path.isdir(path):
            print(f"dir: {path}")
            print("\t" + str(os.listdir(path)))
        print(path)

print(os.listdir('/tmp'))
dump_python_path()
# Import the library
import pg8000

Dump in cloudwatch

python path: ['/tmp/pg8000libs.zip', '/opt/amazon/bin', '/tmp/pg8000libs.zip', '/opt/amazon/spark/jars/spark-core_2.12-3.1.1-amzn-0.jar', '/opt/amazon/spark/python/lib/pyspark.zip', '/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip', '/opt/amazon/lib/python3.6/site-packages', '/usr/lib64/python37.zip', '/usr/lib64/python3.7', '/usr/lib64/python3.7/lib-dynload', '/home/spark/.local/lib/python3.7/site-packages', '/usr/lib64/python3.7/site-packages', '/usr/lib/python3.7/site-packages']
1

There are 1 best solutions below

0
On

After exhausting all the standard approaches, I found a workaround using sys.path. By adding the current directory to the Python import search path, the Glue job was able to locate and import the additional .py file successfully. I added the whole directory to python path. Here's an example of the code I used:

import sys
import os

current_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(current_dir)

from utils import *

Important Note:

Modifying the import search path should be used carefully, as it may introduce module name conflicts or unintended imports. It's recommended to ensure proper file organization and make the necessary adjustments for a more robust and maintainable solution.