Dropping data outside valid range when using geom_ma in scatterplot

516 Views Asked by At

I have four categories that I am plotting her using ggplot. I would like add a moving average using geom_ma but I have too few of the green dots to get a good moving average (I would prefer a period of at least 20). How can I keep the scatterplot as is and only add a MA of the purple and blue dots, which would be in my range of a 20 period moving average?

Example: ggplot(data, aes(x, y, color=Str)) + geom_point(stat="identity") + geom_ma(ma_fun = SMA, n = 20, linetype=1, size=1, na.rm=TRUE)

I get the error: "Warning message: Computation failed in stat_sma(): n = 20 is outside valid range: [1, 10]"

1

There are 1 best solutions below

0
On

This is a great example of why it helps to provide a minimal reproducible example. You have provided the code that produced the error, but there is nothing wrong with the code on its own: it will only cause this error with certain inputs. Given suitable data, your code is fine.

Let's make a dummy data frame with the same name and column names as your data frame. We will make data for the first 330 days of 2020, and we will have 4 groups in Str, so a total of 1320 rows:

library(tidyquant)
library(ggplot2)

set.seed(1)

data <- data.frame(x = rep(seq(as.Date("2020-01-01"), 
                           by = "day", length.out = 330), 4),
                   y = as.vector(replicate(4, 1000 * cumsum(rnorm(330)))),
                   Str = rep(c("A", "B", "C", "D"), each = 330))

Now if we use your exact plotting code, we can see that the plot is fine:

ggplot(data, aes(x, y, color = Str)) + 
  geom_point(stat="identity") + 
  geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)

But if one or more of our Str groups has fewer than 20 measurements, then we get your error. Let's remove most of the Str == "A" and Str == "B" cases, and repeat the plot:

data <- data[c(1:20 * 33, 661:1320),]

ggplot(data, aes(x, y, color = Str)) + 
  geom_point(stat="identity") + 
  geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
#> Warning: Computation failed in `stat_sma()`:
#> n = 20 is outside valid range: [1, 10]

enter image description here

We get your exact warning, and the MA lines disappear from all the groups. Clearly we cannot get a 20-measurement moving average if we only have 10 data points, so geom_ma just gives up.

The fix here is to use the data = argument in geom_ma to filter out any groups with fewer than 20 data points:

ggplot(data, aes(x, y, color = Str)) + 
  geom_point(stat="identity") + 
  geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE,
          data = data[data$Str %in% names(table(data$Str)[table(data$Str) > 20]),])

enter image description here