Memory leak in GraphX even if checkpoint is called on the graph

256 Views Asked by At

I am facing OOM whithin a spark streaming application with GraphX.

While trying to isolate and reproduce the issue on a simple application, I was able to identify what appears to be 2 kind of memory leaks.
The details of those leaks and how to reproduce can be found here:
https://issues.apache.org/jira/browse/SPARK-19023 (not sure creating an issue was the best way to raise this issue and ask for some inputs?)

Regarding those 2 leaks, my current status is the following one:

  • for the 1st leak, I think the same kind of fix done in ZippedPartitionsRDDx described in https://github.com/apache/spark/pull/3545 should be done also in MapPartitionsRDD (ie setting the reference to f to null in the clearDependencies method)
  • for the 2nd leak, the issue is that even after a checkpoint on the graph is performed, some reference to a partition array is kept within the local variable "partitions_" within the EdgeRDD
    • It is due to the fact that the checkpoint is delegated to "partitionsRDD" embedded within the EdgeRDD (so the partitions_ variable is correctly re-set to null during the checkpoint on the partitionsRDD but not on the EdgeRDD)
    • In a comment of the JIRA issue I've created, I am describing how I was able to fix this issue (not calling partitions but getPartitions while defining the partitioner within EdgeRDDImpl), but I don't think this fix is valid as it doesn't appear to be a "robust" one.

Can some of you confirm my analysis regarding the 2 leaks I've described and give their opinion on the proposed fixes?

0

There are 0 best solutions below