Data transformation: Select certain observation plus observations before and after the date it was taken R

67 Views Asked by At

I have a dataset with 450 observations and variables "date", "year", "site", "number". I want to select the observations with the highest number per site and year, and then select the numbers before and after the date on which that observation was taken.

df <- data.frame(
  year = c(rep(2029, 10), rep(2020, 10), rep(2021, 10)),
  date = c(seq(as.Date("2029-01-01"), as.Date("2029-01-10"), by = "day"),
            seq(as.Date("2020-01-01"), as.Date("2020-01-10"), by = "day"),
            seq(as.Date("2021-01-01"), as.Date("2021-01-10"), by = "day")),
  site = rep(c("Site A", "Site B", "Site C"), each = 10, times = 3),
  number = sample(1:100, 30, replace = TRUE))

I've gotten as far as select the highest number per site and year using dplyr::group_by(site,year) %>% slice_max(n=1, number) but I'm stuck after this.

Thanks!

2

There are 2 best solutions below

0
Chris Ruehlemann On

You can write a function to get the index of the max value:

row_sequence <- function(value) {
  inds <- which.max(value)
  sort(unique(c(inds - 1, inds, inds + 1)))
}


library(dplyr)
df %>% 
  group_by(site,year) %>%
  slice(row_sequence(number))
# A tibble: 9 × 4
# Groups:   site, year [3]
   year date       site   number
  <dbl> <date>     <chr>   <int>
1  2029 2029-01-01 Site A     97
2  2029 2029-01-01 Site A     97
3  2029 2029-01-01 Site A     97
4  2020 2020-01-09 Site B     92
5  2020 2020-01-09 Site B     92
6  2020 2020-01-09 Site B     92
7  2021 2021-01-01 Site C     97
8  2021 2021-01-01 Site C     97
9  2021 2021-01-01 Site C     97
0
Andre Wildberg On

This prints around all max values and max, including draws.

Note that variables e.g. year "2021" and site "Site C" are one group, even if they are not ordered to be close to each other, so bordering values include those blocks too.

df %>% 
  group_by(site, year) %>% 
  mutate(N = max(number) == number, N = N | lead(N) != N | lag(N) != N) %>% 
  filter(N) %>% 
  select(-N) %>% 
  ungroup() %>% 
  print(n=Inf)
# A tibble: 26 × 4
    year date       site   number
   <dbl> <date>     <chr>   <int>
 1  2029 2029-01-03 Site A     71
 2  2029 2029-01-04 Site A    100
 3  2029 2029-01-05 Site A     89
 4  2020 2020-01-03 Site B     36
 5  2020 2020-01-04 Site B     95
 6  2020 2020-01-05 Site B      5
 7  2021 2021-01-01 Site C     97
 8  2021 2021-01-02 Site C     42
 9  2021 2021-01-10 Site C     36
10  2029 2029-01-03 Site A     71
11  2029 2029-01-04 Site A    100
12  2029 2029-01-05 Site A     89
13  2020 2020-01-03 Site B     36
14  2020 2020-01-04 Site B     95
15  2020 2020-01-05 Site B      5
16  2021 2021-01-01 Site C     97
17  2021 2021-01-02 Site C     42
18  2021 2021-01-10 Site C     36
19  2029 2029-01-03 Site A     71
20  2029 2029-01-04 Site A    100
21  2029 2029-01-05 Site A     89
22  2020 2020-01-03 Site B     36
23  2020 2020-01-04 Site B     95
24  2020 2020-01-05 Site B      5
25  2021 2021-01-01 Site C     97
26  2021 2021-01-02 Site C     42