I was trying to fit my dataset into the CART model, but I keep on getting ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). as an error. error problem
I had already double, even triple checked the dataset and I have seen that it does not contain any NaN, infinity, or anything that counts as that. I have also double checked if there were any blanks, and there weren't. I tried everything including the most famous thread here, but to no avail. What could I be doing wrong?
Edit:
flood_tr=df.sample(frac=0.75,random_state=42)
flood_test=df.drop(flood_tr.index)
y = flood_tr['flood_height']
mar_np = np.array(flood_tr['precipitation']) (mar_cat, mar_cat_dict) = stattools.categorical(mar_np, drop=True, dictnames=True)
mar_cat_pd = pd.DataFrame(mar_cat)
X = pd.concat((flood_tr[['elev']], mar_cat_pd), axis = 1)
rfy = np.ravel(y)
rf01 = RandomForestClassifier(n_estimators = 100,
criterion="gini").fit(X,rfy) #<--- this is where i got the error
here is the data set I used https://www.kaggle.com/datasets/giologicx/aegisdataset
Your dataset has values larger than float32 (single-precision). I would recommend doing the following.
where dec == decimal precision in between something like (2~5)