Is it possible to do a spark submission using spark-submit without being able to directly access the dfs port or the https port on the datanodes? spark-submit seems to want to upload files in cluster-mode and also try to connect to the nodes when 'deploy-mode' is set to client.

Is this the same for sparklyr? (wrapper for spark-submit) And what about jupyter notebooks? Can this be done without it?

Tried spark-submit with a HADOOP_CONF_DIR, since 9866 is blocked, we still get this error.

org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.117.110.19:9866]

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/users/.sparkStaging/application_1698216436656_0104/spark_conf.zip could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.

How do I avoid uploading __spark__conf.zip altogether or avoid connecting to the datanode dfs port?

0

There are 0 best solutions below