Salut folks! I'm still quiet new to ggplot and trying to understand, but I really need some help here.
Edit: Reproducible Data of my Dataset "Daten_ohne_Cluster_NA", first 25 rows
structure(list(ntaxa = c(2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 5, 8, 8, 7, 7, 6, 5, 5), mpd.obs.z = c(-1.779004391,
-1.721014957, -1.77727283, -1.774642404, -1.789386039, -1.983401439,
-0.875426386, -2.276052068, -2.340365105, -2.203126078, -2.394158227,
-2.278173635, -1.269075471, -1.176760985, -1.313045215, -1.164289676,
-1.247549961, -0.868174033, -2.057106804, -2.03154772, -1.691850922,
-1.224391713, -0.93993654, -0.39315089, -0.418380361), mntd.obs.z = c(-1.759874454,
-1.855202792, -1.866281778, -1.798439855, -1.739998395, -1.890847575,
-0.920672112, -1.381541177, -1.382847758, -1.394870597, -1.339878669,
-1.349541665, -0.516793786, -0.525476292, -0.557425575, -0.539534996,
-0.521299478, -0.638951825, -1.06467985, -1.033009266, -0.758380203,
-0.572401837, -0.166616844, 0.399510209, 0.314591018), pe = c(0.046370234,
0.046370234, 0.046370234, 0.046370234, 0.046370234, 0.046370234,
0.071665745, 0.118619482, 0.118619482, 0.118619482, 0.118619482,
0.118619482, 0.205838414, 0.205838414, 0.205838414, 0.205838414,
0.205838414, 0.179091659, 0.215719118, 0.215719118, 0.212092271,
0.315391478, 0.312205596, 0.305510773, 0.305510773), ECO_NUM = c(1,
6, 6, 1, 7, 6, 6, 6, 6, 6, 6, 7, 7, 6, 1, 6, 6, 6, 6, 6, 6, 7,
7, 7, 6)), row.names = c(NA, -25L), class = c("tbl_df", "tbl",
"data.frame"))
(1) I prepared my Dataframe like this:
'Daten_Cluster <- Daten[, c("ntaxa", "mpd.obs.z", "mntd.obs.z", "pe", "ECO_NUM")]
(2) I threw out all the NA's with na.omit. It is 6 variables with 3811 objects each. The column ECO_NUM represents the different ecoregions as a kategorial, numerical factor.
(3) Then I did a Cluster Analysis with k.means. I used 31 groups as there are 31 ecoregions in my dataset and the aim is to colour the plot after ecoregions lateron.
'Biomes_Clus <- kmeans(Daten_Cluster_ohne_NA, 31, iter.max = 10, nstart = 25)
(4) Then I followed the online-instructions from datanovia.com on how to visualise a k.means cluster analysis (I always just follow these How-To s as I have no idea how to do it all by myself). I tried to change the arguments accordingly to colour after ecoregions.
fviz_cluster(Biomes_Clus, data = Daten_Cluster_ohne_NA,
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw(),
) +
stat_mean(aes(color = Daten_Cluster_ohne_NA$ECO_NUM), size = 4)
I get more than 50 warnings here, I guess for each object. Saying: In grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size) : unimplemented pch value '30'
I know that there are not enough pch-symbols for 31 groups, but I also don't need them - I just would like to have it coloured.
I also tried out the other function ggscatter and created my own color-palette (called P36) with more than 31 colours to have enough colours for the ecoregions.
ggscatter(
ind.coord, x = "Dim.1", y = "Dim.2",
color = "Species", palette = "P36", ellipse = TRUE, ellipse.type = "convex",
legend = "right", ggtheme = theme_bw(),
xlab = paste0("Dim 1 (", variance.percent[1], "% )" ),
ylab = paste0("Dim 2 (", variance.percent[2], "% )" )
) +
stat_mean(aes(color = cluster), size = 4)
The Error here is that a Discrete value was supplied to continuous scale. THe Question is: How can I easily colour the outcome of my k.means (which worked) and colour it not by the newly clustered groups but by the ecoregions (to visualise if there is a difference between the clusters and the ecoregion-groups)?
I appreciate your help and me and my group partner would be very thankful!! :) Greetings Evelyn