This is my label value:
df['Label'].value_counts()
------------------------------------
Benign 4401366
DDoS attacks-LOIC-HTTP 576191
FTP-BruteForce 193360
SSH-Bruteforce 187589
DoS attacks-GoldenEye 41508
DoS attacks-Slowloris 10990
Name: Label, dtype: int64
I use label encoding to endcode:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
label_encoder.fit(df.Label)
df['Label']= label_encoder.transform(df.Label)
And this is the resuslt:
df['Label'].value_counts()
------------------------------
0 4380628
1 576191
4 193354
5 187589
2 41508
3 10990
Name: Label, dtype: int64
I want the result like this:
df['Label'].value_counts()
------------------------------
0 4380628
1 576191
2 193354
3 187589
4 41508
5 10990
Name: Label, dtype: int64
Does anyone know what problem and how to solve it?
Example
we need reproducible and minimal example for answer. lets make
df
Code
your problem is because it is coded in the order in which it appears.
B-0, A-1, C-2 in
df
becuz appear order.if want make A-0, C-1, B-2 (by frequency), this can be solved with pandas alone(dont need other library). using following code:
s
make
s
to col1 columnout
chk value_counts
Update
more efficient way:
s