Create dataframe with cluster assignment using nomclust in R

138 Views Asked by At

I am using nomclust to run hierarchical cluster analysis over nominal data. In order to explain what I need, I am using the dataset CA.methods which is in the same package. I will run the nomclust() function and later I will plot by using dend.plot() and three clusters.

library(nomclust)
data("CA.methods")
hca <- nomclust(CA.methods)
dend.plot(hca, clusters = 3)

This is the result of the plot:

enter image description here

What I need is basically:

  • to create a dataframe with the different elements and the group they belong to,
  • or to include a column in the original dataframe with the cluster attribution.

For example:

AGNES          Cluster1/Red
k-prototypes   Cluster1/Red
LCA            Cluster1/Red
TwoStep        Cluster1/Red
BIRCH          Cluster2/Green
CURE           Cluster2/Green
...
CACTUS         Cluster3/Blue
...

The name of the clusters can be any:

  • Colors
  • ClusterX
  • GroupX
  • etc

Do you know how can I include a column in the original dataframe with its attribution?

Thanks

Test of the code in the answer

I just checked this code coming from the answer of @MrFlick:

data.frame(label=hca$dend$order.lab, group=cutree(hca$dend, k=3))[hca$dend$order, ]

The output is shown below:

          label group
1         AGNES     1
16 k-histograms     1
17      k-modes     1
24           EM     1
2  k-prototypes     2
7         LIMBO     2
18       CACTUS     2
22     DENCLUE      2
4       TwoStep     2
21       DBSCAN     2
10      PROCLUS     2
12        FANNY     2
14          PAM     2
20       STING      2
6          CURE     2
13      k-means     2
15      COOLCAT     2
3           LCA     3
5        BIRCH      3
23       OPTICS     3
8          ROCK     3
9         CLARA     3
19       CLIQUE     3
11        DIANA     3

However, there are some inconsistencies between the dendrogram and the output. For instance, AGNES, k-prototypes, LCA, TwoStep are in the group in red color. However, with the code, AGNES is in cluster 1, kprototypes and TwoStep are in cluster 2 and LCA is in cluster 3.

Any idea?

1

There are 1 best solutions below

1
On

As I contacted with the developper by mail (Zdenek Sulc), he provided me with the answer. I am copying it here just if it might be helpful for anyone.

Obtaining the cluster elements directly from a dendrogram plot may be challenging. Instead, I recommend using a cluster membership variable and the row names to get this information. Below, you can find a simple code for obtaining the names of elements in the created clusters:

library(nomclust)
data("CA.methods")

hca <- nomclust(CA.methods, measure = "lin")
dend.plot(hca, clusters = "BIC")

# elements in each of the clusters (when there are 3 clusters)
row.names(CA.methods)[hca$mem$clu_3 == 1]
row.names(CA.methods)[hca$mem$clu_3 == 2]
row.names(CA.methods)[hca$mem$clu_3 == 3]

# If you have 4 clusters
row.names(CA.methods)[hca$mem$clu_4 == 1]
row.names(CA.methods)[hca$mem$clu_4 == 2]
row.names(CA.methods)[hca$mem$clu_4 == 3]
row.names(CA.methods)[hca$mem$clu_4 == 4]