Why the index of Label Encoding is not seriated?

42 Views Asked by Dead At 27 July 2025 at 19:14

This is my label value:

df['Label'].value_counts()
------------------------------------
Benign                    4401366
DDoS attacks-LOIC-HTTP     576191
FTP-BruteForce             193360
SSH-Bruteforce             187589
DoS attacks-GoldenEye       41508
DoS attacks-Slowloris       10990
Name: Label, dtype: int64

I use label encoding to endcode:

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
label_encoder.fit(df.Label)
df['Label']= label_encoder.transform(df.Label)

And this is the resuslt:

df['Label'].value_counts()
------------------------------
0    4380628
1     576191
4     193354
5     187589
2      41508
3      10990
Name: Label, dtype: int64

I want the result like this:

df['Label'].value_counts()
------------------------------
0    4380628
1     576191
2     193354
3     187589
4      41508
5      10990
Name: Label, dtype: int64

Does anyone know what problem and how to solve it?

Original Q&A

There are 1 best solutions below

Panda Kim On 25 December 2022 at 08:22

Example

we need reproducible and minimal example for answer. lets make

df = pd.DataFrame(list('BACCCCAAAA'), columns=['col1'])

df

Code

df['col1'].value_counts()

A    5
C    4
B    1
Name: col1, dtype: int64

your problem is because it is coded in the order in which it appears.

B-0, A-1, C-2 in df becuz appear order.

if want make A-0, C-1, B-2 (by frequency), this can be solved with pandas alone(dont need other library). using following code:

s = df['col1'].map(lambda x: df['col1'].value_counts().index.get_loc(x))

s

0    2
1    0
2    1
3    1
4    1
5    1
6    0
7    0
8    0
9    0
Name: col1, dtype: int64

make s to col1 column

out = df.assign(col1=s)

out

chk value_counts

out['col1'].value_counts()

0    5
1    4
2    1
Name: col1, dtype: int64

Update

more efficient way:

m = pd.Series(range(df['col1'].nunique()), index=df['col1'].value_counts().index)
s = df['col1'].map(m)

s

0    2
1    0
2    1
3    1
4    1
5    1
6    0
7    0
8    0
9    0
Name: col1, dtype: int64

Why the index of Label Encoding is not seriated?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CSV

Related Questions in MACHINE-LEARNING

Related Questions in LABEL-ENCODING

Trending Questions

Popular # Hahtags

Popular Questions