The values in a fviz_cluster figure do not correspond with the datapoints of the dataset

540 Views Asked by At

I have composed the following script for a clustered scatter plot with fviz_cluster. According to the plot, there are negative values for x and negative values for y.

dots <- dots %>%
  select(GRNHLin, RED2HLin)
dots <- dots %>%
  filter(RED2HLin >= 1.0)

dots.Mclust <- Mclust(dots, modelNames="VVV", G=8)
#BIC <- mclustBIC(dots)
#ICL <- mclustICL(dots)

visual <- fviz_cluster(dots.Mclust, 
             data=dots.Mclust["GRNHLin", "RED2HLin"],
             ellipse.alpha = 0.1,
             geom = c("point"),
             show.clust.cent = FALSE,
             main = FALSE,
             legend = c("right"),
             palette = "npg",
             legend.title = "Clusters") +
  labs(x="Green Fluorescence Intensity", y="Red Fluorescence Intensity")
  #scale_x_continuous(#breaks = trans_breaks("log10", function(x) 10^x),
                #labels = trans_format("log10", math_format(10^.x)),
                #limits = c(0,6)) +
  #scale_y_continuous(#breaks = trans_breaks("log10", function(x) 10^x),
                #labels = trans_format("log10", math_format(10^.x)), 
                #limits = c(-7,6))

Clustered scatter plot (note the values on the left-hand side of and below 0).

According to head(dots.Mclust) (and my thorough analysis) there are no negative values.

           GRNHLin    RED2HLin
   [1,]   81.50364  176.379654
   [2,]   57.94751  116.310577
   [3,]   42.89310  119.758621
   [4,]   41.82213  275.607971
   [5,]  437.14648  141.309647
   [6,]   15.20952  177.128616
   [7,]   18.88731  257.249207
   [8,]  768.64935  172.374069
   [9,]   24.66220  118.283150
  [10,]   17.12160   68.955154
  [11,]   73.00019   71.517052
  [12,] 1182.08911  180.694122

Where does this discrepancy come from? How is fviz(cluster) changing the values on the plot? Did some normalisation or scaling take place?


There are 1 best solutions below


I had this same issue and what fixed it was adding stand=F to you fviz_cluster() options. In the documentation this:

logical value; if TRUE, data is standardized before principal component analysis