Methods for piggybacking on Spark's Py4J Gateway

34 Views Asked by At

I am working on a python application which relies on Py4j to communicate with a Java backend. This application will, for the foreseeable future, exclusively be executed in the context of a Spark job operating in cluster mode. We've added our requisite JARs to the Spark classpath ($SPARK_HOME/jars), and would prefer to connect to the existing Py4j gateway used to communicate with PySpark. Connecting to this gateway is non-trivial due to the non-constant nature of the secret and port. We would prefer to avoid spinning up a second JVM for resource reasons.

Is there a recommended approach for this sort of need?

Re-executing the gateway starting code (https://github.com/apache/spark/blob/839f0c98bd85a14eadad13f8aaac876275ded5a4/python/pyspark/java_gateway.py#L55) results in a second JVM, and more importantly, a port conflict as it attempts to spin up a second gateway on the same port.

0

There are 0 best solutions below