Error enabling lineage in spark using spline?

1.5k Views Asked by At

I tried using spline to track the lineage in spark using both ways specified here But both of them failed with same error

ERROR QueryExecutionEventHandlerFactory: Spline Initialization Failed! Spark lineage tracking is disabled Spark Agent was not able to establish connection with spline gateway

CausedBy: java.net.connectException: Connection Refused

I am able to see the UI at port 8080, 9090 and also arangoDB is up and running.

But no lineage is displayed.

I have tried pyspark as well as spark-shell but no luck. Any help is appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

I was able to resolve the issue by manually creating the rest-server, arangoDb and web-client and then providing the correct uri for producer while running spark shell

--conf "spark.spline.producer.url=http://localhost:8080/producer"

Still I was not getting the lineage on the webui despite applying various actions and transformations.

Later I realized the Lineage is generated once we save the dataframe, so as soon a write was triggered I was able to see the lineage graph.

0
On

Make sure that arangoDB is and Spline Server are up and running.. You can import the below code into your notebook and execute it to check the lineage on spline UI

%scala

import za.co.absa.spline.harvester.SparkLineageInitializer._
System.setProperty("spline.lineageDispatcher","http")
System.setProperty("spline.lineageDispatcher.http.producer.url","http://vm-ip:8080/producer")
spark.enableLineageTracking()

System.setProperty("spline.mode", "REQUIRED")
System.setProperty("spline.persistence.factory", "za.co.absa.spline.persistence.mongo.MongoPersistenceFactory")
System.setProperty("spline.mongodb.url","arangodb://A5wqwd-xyezY@vm-ip/spline")
System.setProperty("spline.mongodb.name", "spline")