Issue
I am facing the below error after having set up the AWS Glue Library:
PS C:\Users\[user]\Documents\[company]\projects\code\data-lake\etl\tealium> python visitor.py
20/04/05 19:33:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "visitor.py", line 9, in <module>
glueContext = GlueContext(sc.getOrCreate())
File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 45, in __init__
File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 66, in _get_glue_scala_context
TypeError: 'JavaPackage' object is not callable
Scenario
I am trying to install AWS GLue ETL Library in a virtual environment using PIPENV. So I've got the below .env file with the environment variables:
HADOOP_HOME="C:\Users\[user]\AppData\Local\Spark\winutils"
SPARK_HOME="C:\Users\[user]\AppData\Local\Spark\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8"
JAVA_HOME="C:\Program Files\Java\jdk1.8.0_231"
PATH="${HADOOP_HOME}\bin"
PATH="${SPARK_HOME}\bin:${PATH}"
PATH="${JAVA_HOME}\bin:${PATH}"
SPARK_CONF_DIR="C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\conf"
PYTHONPATH="${SPARK_HOME}/python/:${PYTHONPATH}"
PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.7-src.zip:${PYTHONPATH}"
PYTHONPATH="C:/Users/[user]/Documents/[company]/projects/code/aws-glue-libs-glue-1.0/PyGlue.zip:${PYTHONPATH}"
My code initially is quite simple and I am only creating the Glue Context as bellow:
from awsglue.context import GlueContext
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from pyspark.conf import SparkConf
sc = SparkContext()
glueContext = GlueContext(sc.getOrCreate())
print(glueContext)
print(sc)
Do you guys know what it may be this issue?
Try this instead:
Also, if you create new glue job it'll give you boilerplate code which solve your problem...