Code:
import pandas as pd
from pyspark.sql import SparkSession
from pysparkling import *
import h2o
from pysparkling.ml import H2OAutoML
spark = SparkSession.builder.appName('SparkApplication').getOrCreate()
hc = H2OContext.getOrCreate()
Spark-submit Command:
spark-submit --master spark://local:7077 --py-files sparkling-water-3.36.1.3-1-3.2/py/h2o_pysparkling_3.2-3.36.1.3-1-3.2.zip --conf "spark.ext.h2o.backend.cluster.mode=external" --conf spark.ext.h2o.external.start.mode="auto" --conf spark.ext.h2o.external.h2o.driver="/home/whiz/spark/h2odriver-3.36.1.3.jar" --conf spark.ext.h2o.external.cluster.size=2 spark_h20/h2o_script.py
Error Logs: py4j.protocol.Py4JJavaError: An error occurred while calling o58.getOrCreate. : java.io.IOException: Cannot run program "hadoop": error=2, No such file or directory**
the automatic start of SW external backend is only support in Hadoop or K8s environments. In a standalone deployment, you need to deploy the external backend manually according to the tutorial in SW documentation.