I have broadcasted a variable in spark(scala) but because of the size of data, it gives output as this
WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, 10.240.0.33): java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
When run on smaller database, it works fine. I want to know the size of this broadcasted variable (in mb/gb). Is there a way to find this?
Assuming you are trying to broadcast obj, you can find its size as follows:
Note that this is an estimator which means it is not 100% correct