Using Spark to add Edges gremlin

304 Views Asked by At

I can't save my edge when I'm using spark as follows: for information it can save edge by using gremlin console

val graph = DseGraphFrameBuilder.dseGraph("GRAPH_NAME", spark)
graph.V().has("vertex1","field1","value").as("a").V().has("vertex2","field1","value").addE("myEdgeLabel").to("a")

When I try: graph.edges.show() I get an empty table

1

There are 1 best solutions below

2
On

addE() step is not yet implemented in DseGraphFrames, you should use DGF specific updateEdges() function. The function is design for bulk updates It take spark dataframe with new edges in DGF format:

scala> newEdges.printSchema
root
 |-- src: string (nullable = false)
 |-- dst: string (nullable = false)
 |-- ~label: string (nullable = true)

src and dst columns are encoded vertex ids. you can ether construct them with g.idColumn() helper function or select them from vertices. Usually you know ids and use helper function

scala>  val df = Seq((1, 2, "myEdgeLabel")).toDF("v1_id", "v2_id", "~label")
scala> val newEdges=df.select(g.idColumn("vertex2", $"v2_id") as "src", g.idColumn("vertex1", $"v1_id") as "dst", $"~label")
scala> g.updateEdges(newEdges)

For your particular case, you can query ids first and then insert base on them. never do this in production, this approach is slow and is not bulk. on huge graphs use method #1:

val dst = g.V.has("vertex1","field1","value").id.first.getString(0)
val src = g.V.has("vertex2","field1","value").id.first.getString(0)
val newEdges = Seq((src, dst, "myEdgeLabel")).toDF("src", "dst", "~label")
g.updateEdges(newEdges)

See documentation: https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/graphAnalytics/dseGraphFrameImport.html