How to submit spark jobs without direct access to the datanode dfs port (dfs.datanode.address) (9866) in our case

31 Views Asked by ljmask At 17 August 2025 at 14:18

Is it possible to do a spark submission using spark-submit without being able to directly access the dfs port or the https port on the datanodes? spark-submit seems to want to upload files in cluster-mode and also try to connect to the nodes when 'deploy-mode' is set to client.

Is this the same for sparklyr? (wrapper for spark-submit) And what about jupyter notebooks? Can this be done without it?

Tried spark-submit with a HADOOP_CONF_DIR, since 9866 is blocked, we still get this error.

org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.117.110.19:9866]

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/users/.sparkStaging/application_1698216436656_0104/spark_conf.zip could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.

How do I avoid uploading __spark__conf.zip altogether or avoid connecting to the datanode dfs port?

Original Q&A

How to submit spark jobs without direct access to the datanode dfs port (dfs.datanode.address) (9866) in our case

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SPARK-SUBMIT

Trending Questions

Popular # Hahtags

Popular Questions