Error on EC2 while using fbprophet : Unrecognized token 'Initial': was expecting 'null', 'true', 'false' or NaN

398 Views Asked by At
m = Prophet()
m.fit(df)

The error below was encountered:

Unrecognized token 'Initial': was expecting 'null', 'true', 'false' or NaN
at [Source: Initial log joint probability = -13.932; line: 1, column: 8]

The above error keeps on coming up. Tried downgrading numpy, reinstalling pystan and fbprophet but still issue remains unresolved.

1

There are 1 best solutions below

0
On

I ran into this same exact issue/error trying to use prophet on an AWS EMR Spark cluster (using a jupyter notebook interface). After much trouble shooting, we realized this is because Spark is expecting back a particular data format—I believe a json with particular fields—but prophet returns a pandas dataframe.

I fixed this issue by writing a user-defined function (udf) in pyspark that allows me to use prophet on a Spark data frame and specify what data will be returned from this Spark function.

I based my own solution on the pandas_udf functions for prophet on Spark in this example and this example.

Below is a generalized version of the function I wrote. For clarity, I was trying to fit a timeseries model on the data I had in order to detect outliers, hence why I fit and then predict on the same data. You'll also need to make sure pyarrow is installed to handle the pandas_udf properly in Spark:

# Import relevant packages
import pyspark.sql.functions as F
import pyspark.sql.types as types
import prophet

# Define output schema of prophet model
output_schema = types.StructType([
                                types.StructField('id', types.IntegerType(), True), #args: name (string), data type, nullable (boolean)
                                types.StructField('ds', types.TimestampType(), True),
                                types.StructField('yhat', types.DoubleType(), True),
                                types.StructField('yhat_lower', types.DoubleType(), True),
                                types.StructField('yhat_upper', types.DoubleType(), True)
                                ])

# Function to fit Prophet timeseries model
@F.pandas_udf(output_schema, F.PandasUDFType.GROUPED_MAP)
def fit_prophet_model(df):
    """
    :param df: spark dataframe containing our the data we want to model.
    :return: returns spark dataframe following the output_schema.     
    """
    
    # Prep the dataframe for use in Prophet
    formatted_df = df[['timestamp', 'value_of_interest']] \
        .rename(columns = {'timestamp': 'ds', 'value_of_interest': 'y'}) \
        .sort_values(by = ['ds'])
    
    # Instantiate model
    model = prophet.Prophet(interval_width = 0.99,
                            growth = 'linear',
                            daily_seasonality = True,
                            weekly_seasonality = True,
                            yearly_seasonality = True,
                            seasonality_mode = 'multiplicative')
    
    # Fit model and get fitted values
    model.fit(formatted_df)
    model_results = model.predict(formatted_df)[['ds', 'yhat', 'yhat_lower', 'yhat_upper']] \
                         .sort_values(by = ['ds'])
    model_results['id'] = formatted_df['id'] #add grouping id
    model_results = model_results[['id', 'ds', 'yhat', 'yhat_lower', 'yhat_upper']] #get columns in correct order
    
    return model_results

Then to run the function on your data simply do the following:

results = (my_data.groupBy('id') \
                  .apply(fit_prophet_model)
          )

results.show(10) #show first ten rows of the fitted model results