Keep getting the error 'Input contains NaN, infinity or a value too large for dtype('float32')' when trying to run a random forest regressor model. I've searched my data set and cant see any infinite values or NaN. Any suggestions?

    [96]  df.replace([np.inf, -np.inf], np.nan, inplace=True)

    [98] df.fillna('mean', inplace=True)

    [100] model = RandomForestRegressor()
          model.fit(X_train, y_train)

Any suggestions would be greatly appreciated, please request more details if required.

Thanks

2

There are 2 best solutions below

4
guin0x On

Have you checked this answer? https://datascience.stackexchange.com/questions/11928/valueerror-input-contains-nan-infinity-or-a-value-too-large-for-dtypefloat32

Alternatively, the error says the number might be too big for dtype('float32'), you could try converting them to dtype('float64') as this would allow larger numbers to be stored in memory.

0
Adrian Ang On

If the largest number in the dataset is 91, I will try to re-define X_train and y_train to troubleshoot.

X_train_new = []
for num in X_train:
    if num > 91 or num < -100:
        print('X_train', str(num))
        break
    X_train_new.append(float(num))

y_train_new = []
for num in y_train:
    if num > 91 or num < -100:
        print('y_train', str(num))
        break
    y_train_new.append(float(num))

model.fit(X_train_new, y_train_new)

Depending on the shape of X_train and y_train (I assume 1-dimensional for now), you may need to customise the above code to their correct shape. But you get the idea?

Also, you may need to define the lower limit as well, ie, if number is less than -100 for example