RDD Warning: Not enough space to cache rdd in memory

1.8k Views Asked by At

I am trying to run PageRank algorithm on a graphframe using pyspark. However when I execute it the program keeps running endlessly and I get following warnings:

enter image description here

The code is as follows:

vertices = sc.createDataFrame(lst_sent,['id', 'Sentence'])
edges = sc.createDataFrame(final_rdd,['src', 'dst','similarity'])
g = GraphFrame(vertices, edges)
g.vertices.show() 
g.edges.show()
g.degrees.show()
pr = g.pageRank(tol=0.000001)
pr.vertices.show()
1

There are 1 best solutions below

0
On

In case anybody else is facing same issue,I found a solution to it.Using RDD persistence solves the problem:

rdd.persist(StorageLevel.MEMORY_AND_DISK)