plotting proportions with geom_ribbon

46 Views Asked by At

I'm trying to show how changes in the proportions of different age classes have changed through time. Presently, for all the data, I have the proportion of individuals at a given age (pAge), and cumulative sums of the proportions that provide the min/max values for each age (cpAgeMin and cpAgeMax) that I'm using in geom_ribbon.

Unfortunately, even though all the proportions sum to 1, my plot doesn't seem to properly reflect the data as there are open spots without data in several years. These don't seem to be issues with the data, but rather something ggplot is doing. How do I fix this so that the entire area from 0-1 in every year is assigned to one age class.

# subset of data
df <- structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
                                  levels = "G", class = "factor"), 
                 year = c(1988, 1988, 1989, 1989, 1989, 1990, 1990, 1991, 1991, 1991, 
                          1992, 1992, 1992, 1993, 1993, 1993, 1998, 1998, 1998, 1999, 
                          1999, 1999, 2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 
                          2003, 2003, 2003), 
                 age = c(`9104` = 1, `9341` = 2, `9292` = 1, `9632` = 2, `9960` = 4, 
                         `9543` = 1, `9857` = 3, `9685` = 1, `9968` = 2, `10169` = 3, 
                         `9858` = 1, `10127` = 2, `10212` = 3, `10009` = 1, `10284` = 2, 
                         `10522` = 6, `10464` = 1, `10605` = 2, `10830` = 3, `10556` = 1, 
                         `10744` = 2, `11056` = 3, `10687` = 1, `11061` = 2, `11279` = 5, 
                         `10912` = 1, `11109` = 2, `11197` = 3, `11065` = 1, `11194` = 2, 
                         `11154` = 1, `11255` = 2, `11324` = 3), 
                 pAGE = c(0.972, 0.028, 0.964, 0.005, 0.031, 0.823, 0.177, 0.921, 0.074, 
                          0.004, 0.846, 0.045, 0.109, 0.833, 0.155, 0.012, 0.927, 0.054, 
                          0.019, 0.784, 0.21, 0.005, 0.958, 0.017, 0.025, 0.852, 0.124, 
                          0.024, 0.913, 0.087, 0.909, 0.073, 0.019), 
                 cpAgeMin = c(0, 0.972, 0, 0.964, 0.969, 0, 0.823, 0, 0.921, 0.996, 0, 
                              0.846, 0.891, 0, 0.833, 0.988, 0, 0.927, 0.981, 0, 0.784, 
                              0.995, 0, 0.958, 0.975, 0, 0.852, 0.976, 0, 0.913, 0, 0.909, 
                              0.981), 
                 cpAgeMax = c(0.972, 1, 0.964, 0.969, 1, 0.823, 1, 0.921, 0.996, 1, 0.846, 
                              0.891, 1, 0.833, 0.988, 1, 0.927, 0.981, 1, 0.784, 0.995, 1, 
                              0.958, 0.975, 1, 0.852, 0.976, 1, 0.913, 1, 0.909, 0.981, 1),
                 seg = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
                         2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), 
            row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"))

# plot
ggplot(data = df) + 
  geom_ribbon(aes(x = year, ymin = cpAgeMin, ymax = cpAgeMax, group = interaction(seg, factor(age)), fill = factor(age), col = factor(age))) +
  theme_bw()

enter image description here

UPDATE:

After implementing @Andy Baxter's solution to a second site, there are extra "bumps" outside of the range of the data that appear in the plot. How do I get rid of these?

enter image description here

2

There are 2 best solutions below

3
Andy Baxter On BEST ANSWER

Stacking it with geom_area and position = "fill" might be smoothest:

library(tidyverse)


df |> 
  ggplot() +
  geom_area(
    aes(
    x = year,
    y = pAGE,
    group = interaction(seg, factor(age)),
    fill = factor(age)
    ),
    position = position_fill(reverse = TRUE)  # to get right order
  )

5
Dubukay On

Eeesh, okay I don't love this solution but here's something that works to fill in the missing values. Basically we need to use complete to turn the implicit NAs into explicit NAs, but then we need to fill things in in a "zig-zag" fashion instead of straight down a column. There's probably a better way of doing this but here's what I came up with:

df %>%
  complete(year, age) %>%
  mutate(cpAgeMin=ifelse(is.na(cpAgeMin), lag(cpAgeMax), cpAgeMin)) %>%
  mutate(cpAgeMax=ifelse(is.na(cpAgeMax), cpAgeMin, cpAgeMax)) %>%
  mutate(cpAgeMin=ifelse(is.na(cpAgeMin), lag(cpAgeMax), cpAgeMin)) %>%
  mutate(cpAgeMax=ifelse(is.na(cpAgeMax), cpAgeMin, cpAgeMax)) %>%
  mutate(cpAgeMin=ifelse(is.na(cpAgeMin), lag(cpAgeMax), cpAgeMin)) %>%
  mutate(cpAgeMax=ifelse(is.na(cpAgeMax), cpAgeMin, cpAgeMax)) %>%
  mutate(cpAgeMin=ifelse(is.na(cpAgeMin), lag(cpAgeMax), cpAgeMin)) %>%
  mutate(cpAgeMax=ifelse(is.na(cpAgeMax), cpAgeMin, cpAgeMax)) %>%
  ggplot() + 
  geom_ribbon(aes(x = year, ymin = cpAgeMin, ymax = cpAgeMax, group = factor(age), fill = factor(age), col = factor(age))) +
  theme_bw()

which produces

enter image description here