Counting consecutive columns that satisfy a condition in R

Question

Counting consecutive columns that satisfy a condition in R

76 Views Asked by RoBeDo At 14 March 2024 at 14:48

I'm working in RStudio with my dataset that contains the daily milk production of cows reared in different locations. Also, I have 5 columns with the temperature in the corresponding 5 previous days to the milking control day.

I would like to count the number of consecutive days ion which the temperature overcome a certain threshold (ex. 30ºC) independently of the position of these days (3 consecutive days could occur during the 3,4,5 days previous to milking control for example). Also, in case that during this 5 days an event with for example, 1 days, and another event with 3 consecutive days occur, I need to take into account the higher number.

Here is a toy dataset that schematically reflects my dataset. How can I count in R the consecutive days that satisfy my conditions?

data <- data.frame(cow=1:5, milk=c(35,36,36,35,34), 
       day1ago=c(27,28,20,24,33), 
       day2ago=c(25,25,32,31,28),
       day3ago=c(22,31,25,31,29),
       day4ago=c(28,33,32,33,28),
       day5ago=c(29,28,33,34,31))

To these toy dataset, I would expect to obtain a vector like this:

data$consecutive_days = c(0,2,2,4,1)

Original Q&A

There are 3 best solutions below

Carl On 14 March 2024 at 15:18

A possible approach:

library(tidyverse)
library(runner)

data <- data.frame(
  cow = 1:5,
  milk = c(35, 36, 36, 35, 34),
  day1ago = c(27, 28, 20, 24, 33),
  day2ago = c(25, 25, 32, 31, 28),
  day3ago = c(22, 31, 25, 31, 29),
  day4ago = c(28, 33, 32, 33, 28),
  day5ago = c(29, 28, 33, 34, 31)
)

data |>
  pivot_longer(starts_with("day")) |>
  mutate(
    above_thresh = if_else(value >= 30, 1, 0),
    consecutive_days = streak_run(above_thresh),
    consecutive_days = if_else(above_thresh == 1, consecutive_days, 0),
    .by = cow
  ) |>
  arrange(cow, above_thresh, consecutive_days) |>
  slice_tail(n = 1, by = cow) |>
  select(cow, consecutive_days)
#> # A tibble: 5 × 2
#>     cow consecutive_days
#>   <int>            <dbl>
#> 1     1                0
#> 2     2                2
#> 3     3                2
#> 4     4                4
#> 5     5                1

^{Created on 2024-03-14 with reprex v2.1.0}

Friede On 14 March 2024 at 16:30

Base R

We can apply rle row-wisely (MARGIN = 1L), which is a bit ugly/hacky to code due to integer(0) returning when there are no consecutive days meeting the condition, therefore we "filter" with ifelse(length(X, ..).

data <- data.frame(cow=1:5, milk=c(35,36,36,35,34), 
                   day1ago=c(27,28,20,24,33), 
                   day2ago=c(25,25,32,31,28),
                   day3ago=c(22,31,25,31,29),
                   day4ago=c(28,33,32,33,28),
                   day5ago=c(29,28,33,34,31))

data$cdo30 = apply(data[grep("day", colnames(data))], 1L, 
                   \(x) { r = rle(x > 30L); v = r$lengths[r$values]
                   ifelse(length(v), max(v), 0L) })

data
  cow milk day1ago day2ago day3ago day4ago day5ago cdo30
1   1   35      27      25      22      28      29     0
2   2   36      28      25      31      33      28     2
3   3   36      20      32      25      32      33     2
4   4   35      24      31      31      33      34     4
5   5   34      33      28      29      28      31     1

Notice that r$lengths[r$values] can be read as r$lengths[r$values == TRUE] and ifelse(length(v), .. as ifelse(length(v) > 1L, ...

**LMc** · Accepted Answer · 2024-03-14T15:42:35.207000

You could use rle on a row-wise data frame:

library(dplyr)

data |>
  rowwise() |>
  mutate(rle = list(rle(c_across(starts_with("day")) > 30)),
         consecutive_days = with(rle, ifelse(any(values), max(lengths[values]), 0))) |>
  ungroup() |>
  select(-rle)

rle computes runs of equal values. In this case runs of TRUE/FALSE if the value is above or below the threshold. We use this to find the longest (max) length (lengths) of TRUE values.

Output

    cow  milk day1ago day2ago day3ago day4ago day5ago consecutive_days
  <int> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>            <dbl>
1     1    35      27      25      22      28      29                0
2     2    36      28      25      31      33      28                2
3     3    36      20      32      25      32      33                2
4     4    35      24      31      31      33      34                4
5     5    34      33      28      29      28      31                1

Counting consecutive columns that satisfy a condition in R

There are 3 best solutions below

Base R

Related Questions in R

Related Questions in DPLYR

Related Questions in TIDYVERSE

Related Questions in DATA-MANIPULATION

Related Questions in COUNTING

Trending Questions

Popular # Hahtags

Popular Questions