Running H2o model prediction in Pysaprk dataframe

52 Views Asked by At

I have been getting error in pyspark while running h2o model prediction.

file "/usr/spark/python/pyspark/cloudpickle.py", line 562, in subimport ModuleNotFoundError; No Modele named h2o

i created pandas udf

`def predict_h2o_model(*cols)
    x=pd.concat(cols,axis=1)
    h2odataframe=h2o.H2OFrame(x)
    scores=model.predict(h2odataframe)
    return pd.series(scores)`

I' scoring using pyspark dataframe

`df_scores=sparf_df.select(F.col("cust_id"),predict_h2o_model(*cols).alias('model_score'))`

I was expecting h2o model scores in spark_df dataframe

1

There are 1 best solutions below

1
Wendy On

Please add

import h2o

before you call anything from h2o toolboxes.