everyone!
I have a list of discounts, and for each discount, I iterate through them. Within each iteration, I generate a few new columns and then predict the sales amount for each product associated with that discount. I aim to compile all the results from these iterations into a single dataframe to have a comprehensive view of every discount on the list along with the predictions. This process has already been successfully implemented in native H2O using Pandas. However, I am now attempting to replicate it in Spark H2O and PySpark.
Unfortunately, when attempting to read the dataframe, I encounter the following error:
RestApiCommunicationException: H2O node http://10.159.20.11:54321 responded with org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 256.0 failed 4 times, most recent failure: Lost task 0.3 in stage 256.0 (TID 3451) (10.159.20.11 executor 0): ai.h2o.sparkling.backend.exceptions.RestApiCommunicationException: H2O node http://10.159.20.11:54321 responded with Status code: 400 : Bad Request
I am seeking a workaround or advice on whether I might be making a mistake.
Expected Behavior: I should be able to access and manipulate the dataframe without issues.
Observed Behavior: The error "RestApiCommunicationException: H2O node http://10.159.20.11:54321/ responded with" prevents further progress.
What I am doing now is saving the predictions to a delta table and then access it, but that takes a little more time than what I expected. I know that loops are not very recommended for PySpark but I am also not seeing a better way of doing this.
Thank you so much!