I'm getting the following error: return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'BOX72'
(BOX72 is a value under column5).
The error seems to come at the line with code impute_knn.fit_transform(X)
Here is the code so far:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np
dataframe = pd.read_csv('file.csv', delimiter=',')
le = LabelEncoder()
dfle = dataframe
dfle2 = dfle.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')
newdf = dfle2[['column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']]
X = dataframe[['column1', 'column2', 'column4', 'column5', 'column6', 'column7']].values
y = dfle.column3
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ohe = OneHotEncoder()
impute_knn = KNNImputer(n_neighbors=2)
impute_knn.fit_transform(X)
ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = ohe.fit_transform(X).toarray()
I know I can probably use something like strip()
, but I can't seem to figure out how I use it to remove any space before or after a string for all cells (in case there are other similar value entries). I also, don't know if this is actually the solution. Any pointers or help would be appreciated. Thank you.