Memory profiling in DirectRunner spark mode

101 Views Asked by At

I am doing memory profiling using YourKit and to simplify the matters for a Spark application, I am running the app in DirectRunner mode. The machine I am testing on has 32 cores. The captured snapshot looks like:

enter image description here

The "direct-runner-worker" has 32 threads and it seems like I was under the false assumption that direct runner occupies just one thread. My question is - shouldn't there be a limit on the number of parallelization threads? In the snapshot a thread occupies between 250 and 350 MB and this will inevitably blow up.

Another question is I am not sure if I should follow http://spark.apache.org/developer-tools.html#profiling for my case, the documentation seems to be for an application running with a SparkCluster but since I am using DirectRunner (for debugging purpose) then maybe whatever I am doing is good enough - does anyone have experience with this?

Any pointers are appreciated! :)

PS: my mind is boggled by the creation of 215 million objects but that should go down with the thread count. However, ~6 million objects per thread seem like a lot.

0

There are 0 best solutions below