spark tasks not starting to execute

3.4k Views Asked by At

i am running a job in spark shell job

--num-executors 15 
--driver-memory 15G 
--executor-memory 7G 
--executor-cores 8 
--conf spark.yarn.executor.memoryOverhead=2G 
--conf spark.sql.shuffle.partitions=500 
--conf spark.sql.autoBroadcastJoinThreshold=-1 
--conf spark.executor.memoryOverhead=800

the job is stuck and not starting the code is doing a cross join with filter conditions on a large dataset of 270m. i have increased partitions to 16000 for the large table 270m and the small table (100000), i have converted it to a broadcast variable

i have added the spark ui for the job ,

so i do have to reduce the partitions , increase the executors, any idea

thanks for helping out .

![spark ui 1][1] ![spark ui 2][2] ![spark ui 3][3] after 10 hours

status: tasks : 7341/16936 (16624 failed)

check the container error logs

RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

[50per completed ui 1 ][4][50per completed ui 2][5] [1]: https://i.stack.imgur.com/nqcys.png [2]: https://i.stack.imgur.com/S2vwL.png [3]: https://i.stack.imgur.com/81FUn.png [4]: https://i.stack.imgur.com/h5MTa.png [5]: https://i.stack.imgur.com/yDfKF.png

1

There are 1 best solutions below

0
On

If you can mention your cluster configurations, then it would be helpful.

But since you added Broadcast of small table of 1000 is working, but 100,000 is not probably you need to adjust your memory config.

As per your config i am assuming you have total : 15 * 7 = 105GB of memory.

You can try with --num-executors 7 --executor-memory 15

This will give more memory to each executor to hold a broadcast variable. Please adjust --executor-cores accordingly for proper utilization