label encoding for the entire datafarame using sklearn LabelEncoder()

684 Views Asked by At

I want to predict sequences using Sequential model of Keras. My dataframe contains string data, so that I decided to use LabelEncoder from sklearn library to encode the string data.

enter image description here

I tried this code snippet:

import pandas as pd
df = pd.read_csv("sample-03.csv")
from sklearn.preprocessing import LabelEncoder
df.apply(LabelEncoder().fit_transform)

giving this result:

enter image description here

This label encoding is applied to each column with different values, i.e. I need to represent http://example.com/296 as "2" for the whole dataset. I would be grateful to be suggested by a solution.

I also tried to convert the dataset to tuples and use a dictionary for this dataset but again the key is not unique for the same value in different columns.

2

There are 2 best solutions below

0
On

I came up with the solution and would like to share it here.

le = LabelEncoder()
le.fit(df.stack().unique())
df['x-2']= le.transform(df['x-2'])
df['x-1']= le.transform(df['x-1'])
df['x0']= le.transform(df['x0'])
df['x1']= le.transform(df['x1'])
df['x2']= le.transform(df['x2'])
0
On

LabelEncoder will not handle your requirement. I will suggest writing a small function that extracts all the unique URLs and then assign a numerical value to each one of them and then replace the URLs with corresponding numerical values in the dataframe.