Counting consecutive columns that satisfy a condition in R

76 Views Asked by At

I'm working in RStudio with my dataset that contains the daily milk production of cows reared in different locations. Also, I have 5 columns with the temperature in the corresponding 5 previous days to the milking control day.

I would like to count the number of consecutive days ion which the temperature overcome a certain threshold (ex. 30ºC) independently of the position of these days (3 consecutive days could occur during the 3,4,5 days previous to milking control for example). Also, in case that during this 5 days an event with for example, 1 days, and another event with 3 consecutive days occur, I need to take into account the higher number.

Here is a toy dataset that schematically reflects my dataset. How can I count in R the consecutive days that satisfy my conditions?

data <- data.frame(cow=1:5, milk=c(35,36,36,35,34), 
       day1ago=c(27,28,20,24,33), 
       day2ago=c(25,25,32,31,28),
       day3ago=c(22,31,25,31,29),
       day4ago=c(28,33,32,33,28),
       day5ago=c(29,28,33,34,31))

To these toy dataset, I would expect to obtain a vector like this:

data$consecutive_days = c(0,2,2,4,1)
3

There are 3 best solutions below

1
LMc On BEST ANSWER

You could use rle on a row-wise data frame:

library(dplyr)

data |>
  rowwise() |>
  mutate(rle = list(rle(c_across(starts_with("day")) > 30)),
         consecutive_days = with(rle, ifelse(any(values), max(lengths[values]), 0))) |>
  ungroup() |>
  select(-rle)

rle computes runs of equal values. In this case runs of TRUE/FALSE if the value is above or below the threshold. We use this to find the longest (max) length (lengths) of TRUE values.

Output

    cow  milk day1ago day2ago day3ago day4ago day5ago consecutive_days
  <int> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>            <dbl>
1     1    35      27      25      22      28      29                0
2     2    36      28      25      31      33      28                2
3     3    36      20      32      25      32      33                2
4     4    35      24      31      31      33      34                4
5     5    34      33      28      29      28      31                1
0
Carl On

A possible approach:

library(tidyverse)
library(runner)

data <- data.frame(
  cow = 1:5,
  milk = c(35, 36, 36, 35, 34),
  day1ago = c(27, 28, 20, 24, 33),
  day2ago = c(25, 25, 32, 31, 28),
  day3ago = c(22, 31, 25, 31, 29),
  day4ago = c(28, 33, 32, 33, 28),
  day5ago = c(29, 28, 33, 34, 31)
)

data |>
  pivot_longer(starts_with("day")) |>
  mutate(
    above_thresh = if_else(value >= 30, 1, 0),
    consecutive_days = streak_run(above_thresh),
    consecutive_days = if_else(above_thresh == 1, consecutive_days, 0),
    .by = cow
  ) |>
  arrange(cow, above_thresh, consecutive_days) |>
  slice_tail(n = 1, by = cow) |>
  select(cow, consecutive_days)
#> # A tibble: 5 × 2
#>     cow consecutive_days
#>   <int>            <dbl>
#> 1     1                0
#> 2     2                2
#> 3     3                2
#> 4     4                4
#> 5     5                1

Created on 2024-03-14 with reprex v2.1.0

0
Friede On

Base R

We can apply rle row-wisely (MARGIN = 1L), which is a bit ugly/hacky to code due to integer(0) returning when there are no consecutive days meeting the condition, therefore we "filter" with ifelse(length(X, ..).

data <- data.frame(cow=1:5, milk=c(35,36,36,35,34), 
                   day1ago=c(27,28,20,24,33), 
                   day2ago=c(25,25,32,31,28),
                   day3ago=c(22,31,25,31,29),
                   day4ago=c(28,33,32,33,28),
                   day5ago=c(29,28,33,34,31))

data$cdo30 = apply(data[grep("day", colnames(data))], 1L, 
                   \(x) { r = rle(x > 30L); v = r$lengths[r$values]
                   ifelse(length(v), max(v), 0L) })

data
  cow milk day1ago day2ago day3ago day4ago day5ago cdo30
1   1   35      27      25      22      28      29     0
2   2   36      28      25      31      33      28     2
3   3   36      20      32      25      32      33     2
4   4   35      24      31      31      33      34     4
5   5   34      33      28      29      28      31     1

Notice that r$lengths[r$values] can be read as r$lengths[r$values == TRUE] and ifelse(length(v), .. as ifelse(length(v) > 1L, ...