Reduce High number of classes in to few by using clustering than perform classification

41 Views Asked by prateek s At 27 December 2023 at 08:18

Hi have an unbalanced text dataset with around 60 number of output classes, out of which 1 class is already combination of 240 different classes clubbed by business as per requirement, not by similar nature. So the population distribution of classes looks like:

Class	Population
Class 1	56%
Class 2	16%
Class 3	12%
Class 4	8%
Class 5	6%
.......	.....
Class 59	0.06%

I tried multiple text pre processing approach followed by different classification algorithm, but highest precision/recall i received is 0.65/0.63.
So I want to further club similar classes using ML clustering up to 10 unique classes, and than will perform classification. I have used kmeans and produced 10 cluster. which give out like below:

k=kmeans(10,n_init=10,random_state=42)
feature=k.fit(feature)
df['cluster']=k.labels_
df['class']['cluster'].value_counts(normalize)

"O/P is like:"
class1
cluster_0: 0.40
       _6: 0.20
       _2: 0.16
class2
cluster_0: 0.46
       _6: 0.25
       _2: 0.15
and so on

How to map 10 clusters in to 10 unique class names.in other world clubbing similar classes in to one. Should I increase number of clusters. Or any other approach should I follow to club number classes. Ouput Classes I am expecting

Original	Expected
Class 1	Class1
Class 2	Class2
Class 3	Class3
Class 4	Class2
Class 5	Class1
Class 6	Class1
Class 7	Class2
Class 8	Class3

Original Q&A

Reduce High number of classes in to few by using clustering than perform classification

There are 0 best solutions below

Related Questions in NLP

Related Questions in CLASSIFICATION

Related Questions in CLUSTER-ANALYSIS

Related Questions in K-MEANS

Related Questions in JNLP

Trending Questions

Popular # Hahtags

Popular Questions