Splitting the group which contains the bottom 25th percentile of volume into 2 rows

Question

Splitting the group which contains the bottom 25th percentile of volume into 2 rows

55 Views Asked by AColoredReptile At 24 March 2024 at 00:43

library(tidytable)

my_dt <- tidytable(
     price = c(4.5, 4.6, 4.7, 4.8, 5.1),
     volume = c(0.4,1.2,0.3,8.7,6.5)
)

total = my_dt %>% summarize(total = sum(volume)) %>% pull(total)

trim = total*0.25

What I would like to do is find the price value where 25% of the total volume occurs. And then splitting that row into a 'trimmed' portion and a non trimmed portion. So in this example I would end up with something that looks like this:

post_trim <- tidytable(
         price = c(4.5, 4.6, 4.7, 4.8, 4.8, 5.1),
         volume = c(0.4, 1.2, 0.3, 2.375, 6.325, 6.5),
         trim = c(True, True, True, True, False, False)
    )

The way I had initially thought of how to do this would be to iterate through the rows of the dataframe, where the rows would be returned as lists and tracking the remainder after subtracting the quantity until I find a row where the remainder is less then the volume:

remainder = trim
pivot = 0

for (row in my_dt){
    if (remainder <= row[2]){
        pivot = row[1]
        break
    } else { 
        remainder = remainder - row[2]
    }

And then from here I just delete the row entry for where price == pivot and re-enter two rows one where volume is the remainder and the other where the volume is original volume - remainder.

I know loops over rows are generally not a good idea though so I'm wondering how I would accomplish this without doing that?

Original Q&A

There are 2 best solutions below

Adriano Mello On 24 March 2024 at 15:22

tidyverse solution is really easy (or am I missing something)?

trim <- 0.25 * sum(my_dt$volume, na.rm = TRUE)
post_trim <- mutate(my_dt, trim = if_else(cumsum(volume) < trim, TRUE, FALSE))

> post_trim
# A tibble: 5 × 3
  price volume trim 
  <dbl>  <dbl> <lgl>
1   4.5    0.4 TRUE 
2   4.6    1.2 TRUE 
3   4.7    0.3 TRUE 
4   4.8    8.7 FALSE
5   5.1    6.5 FALSE

PS. OP's my_dt and my_trim have different price and volume data at the end.

# my_dt -------------------
my_dt <- tibble::tribble(
  ~price, ~volume,
  4.5,     0.4,
  4.6,     1.2,
  4.7,     0.3,
  4.8,     8.7,
  5.1,     6.5)

**zephryl** · Accepted Answer · 2024-03-24T03:11:13.180000

This kind of problem is a bit easier to work with in base R:

trim <- sum(my_dt$volume) * 0.25
csum <- cumsum(my_dt$volume)
# identify 25th %ile row
trow <- min(which(csum >= trim))

# duplicate 25th %ile row
my_dt <- my_dt[c(1:trow, trow:nrow(my_dt)), ]
# trim value, with safeguard if first row
my_dt$volume[[trow]] <- if (trow == 1) trim else trim - csum[[trow - 1]]
# remainder in next row
my_dt$volume[[trow + 1]] <- csum[[trow]] - trim
# add logical flag
my_dt$trim <- seq(nrow(my_dt)) <= trow

Result:

# A tibble: 6 × 3
  price volume trim 
  <dbl>  <dbl> <lgl>
1   4.5   0.4  TRUE 
2   4.6   1.2  TRUE 
3   4.7   0.3  TRUE 
4   4.8   2.37 TRUE 
5   4.8   6.32 FALSE
6   5.1   6.5  FALSE

Splitting the group which contains the bottom 25th percentile of volume into 2 rows

There are 2 best solutions below

Related Questions in R

Related Questions in TIDYTABLE

Trending Questions

Popular # Hahtags

Popular Questions