Combining visualizations in Rstudio with double Y-axis

43 Views Asked by At

I'm currently studying Data Analysis in R and Rstudio and I've got an issue dealing with a double Y-axis visualization. This is my code: `

Test_Zscore <- traitement_CDE_LONG |> 
  group_by(Athletes) |> 
  mutate(moyenne = mean(CDE), SD = sd(CDE))

traitement_final <- left_join(CDE, Test_Zscore, by = "Athletes") |> 
  select(Athletes, Date.x, CDE_sum, moyenne, SD) |> 
  distinct(Athletes, .keep_all = TRUE) |> 
  group_by(Athletes) |> 
  mutate(Z_Score = (CDE_sum - moyenne)/SD)

plot_CDE_jour <- ggplot(data = traitement_final, aes(x = Athletes)) +
  geom_col(aes(y = CDE_sum), fill = "skyblue") +  # Colonne pour les CDE
  geom_line(aes(y = Z_Score * 750 - 1500, group = 1), colour = "red") +  # Ligne pour les Zscores, ajustez le facteur de mise à l'échelle et le décalage pour aligner avec l'axe des Y des CDE
  scale_y_continuous(
    "CDE",
    sec.axis = sec_axis(~ (. + 1500) / 750 - 4, name = "Z-Scores")  # Créer un second axe des ordonnées pour les Zscores, ajustez selon le besoin
  ) +
  scale_x_discrete("Athletes") +
  coord_cartesian(ylim = c(0, 1500)) +  # Définir les limites pour l'axe principal des Y
  labs(title = "CDE et Zscore par Athlète") +
  theme_minimal() +
  geom_hline(aes(yintercept = (-1.96 * 750 - 1500)), linetype = "dashed", color = "blue") +  # Seuil de -1.96
  geom_hline(aes(yintercept = (1.96 * 750 - 1500)), linetype = "dashed", color = "blue")    # Seuil de 1.96


print(plot_CDE_jour)

Picture 1 : What I get Picture 2 : What I want

My data :

structure(list(Athletes = c("Abadie", "Abescat", "Antonescu", 
"Auradou", "Balfet", "Barbaste", "Betham", "Boundjema", "Castinel", 
"Chauvet"), Date.x = structure(c(19747, 19747, 19747, 19747, 
19747, 19747, 19747, 19747, 19747, 19747), class = "Date"), CDE_sum = c(824, 
690, 750, 481, 756, 764, 654, 516, 695, 746), moyenne = c(710.558181818182, 
738.504504504505, 596.219117647059, 637.671287128713, 714.474698795181, 
748.532978723404, 634.503260869565, 524.178947368421, 620.496330275229, 
642.718348623853), SD = c(417.313941045778, 363.098405992192, 
302.630508794043, 293.807319628573, 326.882040275206, 360.934928871125, 
335.865907456306, 311.70092564648, 360.183333957143, 346.311225677975
), Z_Score = c(0.271838074466278, -0.133585010851157, 0.508147321186304, 
-0.53324501012015, 0.127034514254312, 0.0428526585802353, 0.0580491758693049, 
-0.0262397275576183, 0.206849297845734, 0.298233622586019)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
    Athletes = c("Abadie", "Abescat", "Antonescu", "Auradou", 
    "Balfet", "Barbaste", "Betham", "Boundjema", "Castinel", 
    "Chauvet"), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 
        7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE))

With this code I get this visualization: see "What I get".

You can see that the geom_line() doesn't appear. I think it's a scale issue but I don't know how to set it. I want a geom_col() like the "graph" where a geom_line() is superposed with his scale. I also want 2 thresholds represented by geom_hline(). The first Y-axis begins at 0 and ends at 1500 and the second at -2 and 2. May someone help me to adjust my second Y-axis correctly? Or finding another way to perform what I want to do. I'm also open to any suggestions that improve my code. Please, forgive my English.

Thank you!

I tried several ways to scale my Y axis but it didn't work there is always an issue with geom_line() or a scaling issue.

0

There are 0 best solutions below