I am using nomclust
to run hierarchical cluster analysis over nominal data. In order to explain what I need, I am using the dataset CA.methods
which is in the same package. I will run the nomclust()
function and later I will plot by using dend.plot()
and three clusters.
library(nomclust)
data("CA.methods")
hca <- nomclust(CA.methods)
dend.plot(hca, clusters = 3)
This is the result of the plot:
What I need is basically:
- to create a dataframe with the different elements and the group they belong to,
- or to include a column in the original dataframe with the cluster attribution.
For example:
AGNES Cluster1/Red
k-prototypes Cluster1/Red
LCA Cluster1/Red
TwoStep Cluster1/Red
BIRCH Cluster2/Green
CURE Cluster2/Green
...
CACTUS Cluster3/Blue
...
The name of the clusters can be any:
- Colors
- ClusterX
- GroupX
- etc
Do you know how can I include a column in the original dataframe with its attribution?
Thanks
Test of the code in the answer
I just checked this code coming from the answer of @MrFlick:
data.frame(label=hca$dend$order.lab, group=cutree(hca$dend, k=3))[hca$dend$order, ]
The output is shown below:
label group
1 AGNES 1
16 k-histograms 1
17 k-modes 1
24 EM 1
2 k-prototypes 2
7 LIMBO 2
18 CACTUS 2
22 DENCLUE 2
4 TwoStep 2
21 DBSCAN 2
10 PROCLUS 2
12 FANNY 2
14 PAM 2
20 STING 2
6 CURE 2
13 k-means 2
15 COOLCAT 2
3 LCA 3
5 BIRCH 3
23 OPTICS 3
8 ROCK 3
9 CLARA 3
19 CLIQUE 3
11 DIANA 3
However, there are some inconsistencies between the dendrogram and the output. For instance, AGNES, k-prototypes, LCA, TwoStep
are in the group in red
color. However, with the code, AGNES
is in cluster 1
, kprototypes
and TwoStep
are in cluster 2
and LCA
is in cluster 3
.
Any idea?
As I contacted with the developper by mail (Zdenek Sulc), he provided me with the answer. I am copying it here just if it might be helpful for anyone.
Obtaining the cluster elements directly from a dendrogram plot may be challenging. Instead, I recommend using a cluster membership variable and the row names to get this information. Below, you can find a simple code for obtaining the names of elements in the created clusters: