Data transformation: Select certain observation plus observations before and after the date it was taken R

Question

Data transformation: Select certain observation plus observations before and after the date it was taken R

67 Views Asked by CMai At 23 April 2023 at 15:33

I have a dataset with 450 observations and variables "date", "year", "site", "number". I want to select the observations with the highest number per site and year, and then select the numbers before and after the date on which that observation was taken.

df <- data.frame(
  year = c(rep(2029, 10), rep(2020, 10), rep(2021, 10)),
  date = c(seq(as.Date("2029-01-01"), as.Date("2029-01-10"), by = "day"),
            seq(as.Date("2020-01-01"), as.Date("2020-01-10"), by = "day"),
            seq(as.Date("2021-01-01"), as.Date("2021-01-10"), by = "day")),
  site = rep(c("Site A", "Site B", "Site C"), each = 10, times = 3),
  number = sample(1:100, 30, replace = TRUE))

I've gotten as far as select the highest number per site and year using dplyr::group_by(site,year) %>% slice_max(n=1, number) but I'm stuck after this.

Thanks!

Original Q&A

There are 2 best solutions below

**Chris Ruehlemann** · Answer 1 · 2023-04-23T15:45:28.587000

You can write a function to get the index of the max value:

row_sequence <- function(value) {
  inds <- which.max(value)
  sort(unique(c(inds - 1, inds, inds + 1)))
}


library(dplyr)
df %>% 
  group_by(site,year) %>%
  slice(row_sequence(number))
# A tibble: 9 × 4
# Groups:   site, year [3]
   year date       site   number
  <dbl> <date>     <chr>   <int>
1  2029 2029-01-01 Site A     97
2  2029 2029-01-01 Site A     97
3  2029 2029-01-01 Site A     97
4  2020 2020-01-09 Site B     92
5  2020 2020-01-09 Site B     92
6  2020 2020-01-09 Site B     92
7  2021 2021-01-01 Site C     97
8  2021 2021-01-01 Site C     97
9  2021 2021-01-01 Site C     97

**Andre Wildberg** · Answer 2 · 2023-04-23T16:53:04.487000

This prints around all max values and max, including draws.

Note that variables e.g. year "2021" and site "Site C" are one group, even if they are not ordered to be close to each other, so bordering values include those blocks too.

df %>% 
  group_by(site, year) %>% 
  mutate(N = max(number) == number, N = N | lead(N) != N | lag(N) != N) %>% 
  filter(N) %>% 
  select(-N) %>% 
  ungroup() %>% 
  print(n=Inf)
# A tibble: 26 × 4
    year date       site   number
   <dbl> <date>     <chr>   <int>
 1  2029 2029-01-03 Site A     71
 2  2029 2029-01-04 Site A    100
 3  2029 2029-01-05 Site A     89
 4  2020 2020-01-03 Site B     36
 5  2020 2020-01-04 Site B     95
 6  2020 2020-01-05 Site B      5
 7  2021 2021-01-01 Site C     97
 8  2021 2021-01-02 Site C     42
 9  2021 2021-01-10 Site C     36
10  2029 2029-01-03 Site A     71
11  2029 2029-01-04 Site A    100
12  2029 2029-01-05 Site A     89
13  2020 2020-01-03 Site B     36
14  2020 2020-01-04 Site B     95
15  2020 2020-01-05 Site B      5
16  2021 2021-01-01 Site C     97
17  2021 2021-01-02 Site C     42
18  2021 2021-01-10 Site C     36
19  2029 2029-01-03 Site A     71
20  2029 2029-01-04 Site A    100
21  2029 2029-01-05 Site A     89
22  2020 2020-01-03 Site B     36
23  2020 2020-01-04 Site B     95
24  2020 2020-01-05 Site B      5
25  2021 2021-01-01 Site C     97
26  2021 2021-01-02 Site C     42

Data transformation: Select certain observation plus observations before and after the date it was taken R

There are 2 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in TIDYR

Related Questions in PLYR

Trending Questions

Popular # Hahtags

Popular Questions