Calculating percentile of values from separate grouped dataframes

52 Views Asked by At

I have two dataframes, one (df1) contains the minimum 'flow' value for specific events observed at different sites. The second dataframe (df2) contains the complete flow rime series.

df1 <- data.frame(Event = as.character(seq(1,9,1)), 
              Site_ID = rep(c("a","b","c"),each=3), 
              Flow = rnorm(9,2,1))

 df2 <- data.frame(Site_ID = rep(c("a","b","c"),each=20), 
              Flow = rnorm(60,2,1))

I'd like to use dplyr to go through each 'Flow' value in df1, and calculate its percentile value relative to the complete time series of the corresponding site in df2 using ecdf. Any help would be much appreciated. Thanks.

2

There are 2 best solutions below

0
Allan Cameron On BEST ANSWER

Here's a method using group_split and map2:

library(tidyverse)

df1 %>%
  mutate(quantile = map2(group_split(df1, Site_ID), 
                         group_split(df2, Site_ID), 
                         ~ecdf(.y$Flow)(.x$Flow)) |> unlist())
#>   Event Site_ID     Flow quantile
#> 1     1       a 2.004936     0.55
#> 2     2       a 2.072371     0.55
#> 3     3       a 1.767891     0.45
#> 4     4       b 2.686329     0.70
#> 5     5       b 1.517909     0.30
#> 6     6       b 1.574355     0.35
#> 7     7       c 1.606521     0.30
#> 8     8       c 1.680232     0.30
#> 9     9       c 1.604164     0.30
0
ThomasIsCoding On

You can try the code below

df1 %>%
  mutate(
    Quantile =
      ecdf(df2$Flow[df2$Site_ID == first(Site_ID)])(Flow),
    .by = Site_ID
  )