I have a Spark Job, When Running Job on a YARN Cluster (HDP 3.1), After A long Time (about 1hour) i get this Message on Trace Log and Job Nothing to Do, After That Job Create Executers and Running Successfully. How can I Fix This Waiting Time? Is That a Bug or Misconfiguration on My Cluster?
this is message i get :
TRACE TransportClient: Sending RPC to datanode01.local/192.168.x.x/192.168.66.43:45564 TRACE TransportClient: Sending request 5477612075831031235 to datanode01.local/192.168.x.x:45564 took 1 ms TRACE MessageDecoder: Received message RpcResponse: RpcResponse{requestId=5477612075831031235, body=NettyManagedBuffer{buf=PooledUnsafeDirec tByteBuf(ridx: 21, widx: 102, cap: 128)}}
I Test It Several Time, And I Found If I Run Job Once a Day, A Day After That Job Run Successful Without Waiting and Day After That Waiting a Lot. This Happened Every Day.