I have a DataFrame with multiple columns. I am trying to normalize all the columns except for one, price
.
I found a code that works perfectly on a sample DataFrame I created, but when I use it on the original DataFrame I have, it gives an error ValueError: Columns must be same length as key
Here is the code I am using:
df_final_1d_normalized = df_final_1d.copy()
cols_to_norm = df_final_1d.columns[df_final_1d.columns!='price']
df_final_1d_normalized[cols_to_norm] = df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
The issue is with reassigning the columns to themselves in the third line of code.
Specifically, this works df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
.
But, this does not work df_final_1d_normalized[cols_to_norm] = df_final_1d_normalized[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
Here is a sample dataframe in case you want to test it out to see that it actually works on other DataFrames
df = pd.DataFrame()
df['A'] = [1,2,3,4, np.nan, np.nan]
df['B'] = [2,4,2,4,5,np.nan]
df['C'] = [np.nan, np.nan, 4,5,6,3]
df['D'] = [np.nan, np.nan, np.nan, 5,4,9]
df_norm = df.copy()
cols_to_norm = df.columns[df.columns!="D"]
df_norm[cols_to_norm] = df_norm[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
What could the error be?
If I am understanding correctly, you dont need a lambda function. You can just write:
This will do the work.
Here is the example from the question:
The result is then: