Suppose I have a dataframe like the following
df = pd.DataFrame({'animal': ['Dog', 'Bird', 'Dog', 'Cat'],
'color': ['Black', 'Blue', 'Brown', 'Black'],
'age': [1, 10, 3, 6],
'pet': [1, 0, 1, 1],
'sex': ['m', 'm', 'f', 'f'],
'name': ['Rex', 'Gizmo', 'Suzy', 'Boo']})
I want to use label encoder to encode "animal", "color", "sex" and "name", but I don't need to encode the other two columns. I also want to be able to inverse_transform the columns afterwards.
I have tried the following, and although encoding works as I'd expect it to, reversing does not.
to_encode = ["animal", "color", "sex", "name"]
le = LabelEncoder()
for col in to_encode:
df[col] = fit_transform(df[col])
## to inverse:
for col in to_encode:
df[col] = inverse_transform(df[col])
The inverse_transform function results in the following dataframe:
animal | color | age | pet | sex | name |
---|---|---|---|---|---|
Rex | Boo | 1 | 1 | Gizmo | Rex |
Boo | Gizmo | 10 | 0 | Gizmo | Gizmo |
Rex | Rex | 3 | 1 | Boo | Suzy |
Gizmo | Boo | 6 | 1 | Boo | Boo |
It's obviously not right, but I'm not sure how else I'd accomplish this?
Any advice would be appreciated!
As you can see in your output, when you are trying to
inverse_transform
, it seems that the code is only using the information he obtained for the last column "name". You can see that because now, all the rows of your columns have values related to names. You should have oneLabelEncoder()
for each column.The key here is to have one
LabelEncoder
fitted for each different column. To do this, I recommend you save them in a dictionary:If we print the dictionary now, we will obtain something like this:
As we can see, for each column we want to transform, we have his
LabelEncoder()
information. This means, for example, that for the animalLabelEncoder
it saves that 0 is equal to bird, 1 equal to cat, ... And the same for each column.Once we have every column fitted, we can proceed to transform, and then, if we want to
inverse_transform
. The only thing to be aware is that every transform/inverse_transform have to use the correspondingLabelEncoder
of this column.Here we transform:
And, once the df is transformed, we can
inverse_transform
:One interesting idea could be using
ColumnTransformer
, but unfortunately, it doesn't suppportinverse_transform()
.