I have the following dask dataframe
column1 column2
a 1
a 2
b 3
c 4
c 5
I need to add a new column with the unique consecutive number of the values in the column1. My output will be:
column1 column2 column 3
a 1 1
a 2 1
b 3 2
c 4 3
c 5 3
How do I achieve it?. Thanks in advance for your help.
You are talking about a label encoding, which you can find implemented in scikit-learn's
LabelEncoder(https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).Here it is applied to your Dask Dataframe:
*the
+ 1is because your labels start from 1. By default they start from 0.Output: