Here’s a dataset with three variables: age, year, and happy:
set.seed(42)
n <- 2000
dat <- data.frame(age=rep(20:79, n/60))
dat$year <- sample(2000:2020, size=NROW(dat), replace = T)
dat$happy <- sample(0:10, size = NROW(dat), replace = T, prob=c(1,2,5,8,11,16,22,18,8,6,3))
If I want to visualise the relationship between age and happiness, I can use geom_smooth as follows:
dat %>% group_by(age) %>%
summarise(mhappy = mean(happy)) %>%
ggplot(aes(x=age, y=mhappy)) +
geom_smooth(method="loess", se=F) +
labs(x="Age", y = "Mean happiness at each age")
But thinking about the “age-period-cohort” dilemma, I don’t know whether the visualization tells me about change rooted in ageing or change rooted in “time”. To address that angle, I could use a statistical model that uses “year” as a control variable. (Let’s ignore cohort for now.)
My question: is it possible to take a short-cut here via the visualization? Can I get geom_smooth to give me a line/curve that is already adjusted for year?
What I have in mind is the way the console output tells me that “geom_smooth [is] using formula = 'y ~ x'”. What I really want is to use the formula 'happy = age + year'. I know that’s not the point of the formula argument for geom_smooth (instead, the point is to specify different functional forms, e.g. y ~ log(x)).
Still, I’m hoping my lack of knowledge/experience on this angle isn’t shared…
What if replace summarise with mutate and plot facetted grid like this: