Label Encoding of Categorical values for Future df

20 Views Asked by SM079 At 29 September 2023 at 16:40

I am building a model where LabelEncoding of 2 categorical columns is a better approach. So I had implemented the same on the train_df and finalized the model.

And for predicting the test_df, I used to fit the 2 categorical columns on train_df and then transform the values on test_df as something like below:

from sklearn.preprocessing import LabelEncoder

le = preprocessing.LabelEncoder()
le.fit(train_df)

le.transform(test_df)

Now I have to save and give the .pkl file of the model to some other team. If in this case, they want to use the model, do they have to fit the labelencoding on train_df again and then transform on their new data?

Original Q&A

There are 1 best solutions below

Ben Reiniger On 29 September 2023 at 18:31

You should save (e.g. as another pickle) your fitted LabelEncoder and provide that along with the model, and instructions (a python script/snippet?) for how to use them to reach a final prediction (le.transform then model.predict).

You might consider using a Pipeline (and potentially other composite estimators) from sklearn to package all of that into one object that just needs to predict.

N.B., LabelEncoder is supposed to be used for target variables (and even then mostly just internally), you would probably be slightly better off with OrdinalEncoder. See e.g. this DS.SE question.

Label Encoding of Categorical values for Future df

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in LABEL-ENCODING

Trending Questions

Popular # Hahtags

Popular Questions