My data is slightly unbalanced, so I am trying to do a SMOTE algorithm before doing the logistic regression model. When I do, I get the error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' Could someone help me figure out why? Here is the code:

X = dummies.loc[:, dummies.columns != 'Count']
y = dummies.loc[:, dummies.columns == 'Count']
#from imblearn.over_sampling import SMOTE
os = SMOTE(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train) # here is where it errors
os_data_X = pd.DataFrame(data=os_data_X,columns=columns )
os_data_y= pd.DataFrame(data=os_data_y,columns=['Count'])

Thank you!

4

There are 4 best solutions below

1
On

I actually just fixed this problem! I made them matrices: os_data_X,os_data_y=os.fit_sample(X_train.as_matrix(), y_train.as_matrix())

4
On

I just encountered this problem myself. As it turned out, I had a duplicate column in my dataset. Perhaps double check that this is not the case for your dataset.

0
On

100% correct solution.

Try to convert your X features into an array first and then feed to SMOTE:

sm = SMOTE()

X=np.array(X)

X, y = sm.fit_sample(X, y.ravel())

0
On

This error is mainly due to the fact that you have duplicate columns in your data. To check for duplicate columns, use:

df.head()

or df.columns

To fix, drop columns using:

df.drop('column_name', axis=1, inplace=True) 

to drop the duplicated column(s).