Unable to view the pyspark job in Spline using ec2 instance

196 Views Asked by At

We created a sample pyspark job and gave the spark-submit commands as following in ec2 instance

sudo ./bin/spark-submit --packages za.co.absa.spline.agent.spark:spark-3.1-spline-agent-bundle_2.12:0.6.1 --conf “spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener” --conf "spark.spline.lineageDispatcher.http.producer.url=http://[ec2] (http://localhost:8080/producer):8080/producer" --conf "spark.spline.lineageDispatcher=logging" /home/ec2-user/spline-sandbox/mysparkjob.py

we are able to view the output in the console but unable to view in spline UI what additional steps need to be done ?

Through docker how can we embed the pyspark job on an ec2 instance ?

1

There are 1 best solutions below

0
On

You set spark.spline.lineageDispatcher=logging that means instead of sending the lineage to spline server it is just written into the log. If you leave that out or set spark.spline.lineageDispatcher=http (which is the default) the lineage data should be send to spark.spline.lineageDispatcher.http.producer.url

I would also recommend using the latest version of spline currently 0.7.12.

You can see documentation of dispatchers and other Spline Agent features here: https://github.com/AbsaOSS/spline-spark-agent/tree/develop if using older version change the branch/tag to see older version of the docuemntation.