I am using tSNE to cluster a set of mRNAs using some previously calculated mRNA structures, using Partitioning around medoids (PAM). The dimensions are 6 conditions (1 with gene values, 5 with mRNA characteristics) and 10.500 genes.
The code used is:
gower_df <- daisy(PV.shock60,
metric = "gower",
type = list(logratio=2))
#use silhouette width to define the right number of clusters
silhouette <- c()
silhouette = c(silhouette, NA)
for(i in 2:10){
pam_clusters = pam(as.matrix(gower_df),
diss = TRUE,
k = i)
silhouette = c(silhouette ,pam_clusters$silinfo$avg.width)
}
pam.df = pam(gower_df, diss = TRUE, k = 8)
#to visualize the clustering
tsne_object <- Rtsne(gower_df, is_distance = TRUE, perplexity = 150)
tsne_df <- tsne_object$Y %>%
data.frame() %>%
setNames(c("X", "Y")) %>%
mutate(cluster = factor(pam.df$clustering))
And here is the result:
I am wondering the reason for these spaghetti-like structures. I fixed some of it increasing perplexity (preserving local structure over global structure isn't necessarily a goal).
Would you have any advice on what could be the cause for these structures?
I tried increasing amounts of perplexity, which gave me better defined clusters. 150 was the minimum level before it started yielding similar cluster definition.
I was expecting clusters with smoother edges, instead of the spaghetti-like structure.
