I want to predict sequences using Sequential model of Keras. My dataframe contains string data, so that I decided to use LabelEncoder from sklearn library to encode the string data.
I tried this code snippet:
import pandas as pd
df = pd.read_csv("sample-03.csv")
from sklearn.preprocessing import LabelEncoder
df.apply(LabelEncoder().fit_transform)
giving this result:
This label encoding is applied to each column with different values, i.e. I need to represent http://example.com/296 as "2" for the whole dataset. I would be grateful to be suggested by a solution.
I also tried to convert the dataset to tuples and use a dictionary for this dataset but again the key is not unique for the same value in different columns.


LabelEncoderwill not handle your requirement. I will suggest writing a small function that extracts all the unique URLs and then assign a numerical value to each one of them and then replace the URLs with corresponding numerical values in the dataframe.