Issue with NA's using cumsum() on multiple variables - R

50 Views Asked by At

Details In an effort to run a linear model, I need to produce a cumulative sum of the count of different species. Each species is recorded as individual count over time on the spreadsheet 'fish1'.

EXAMPLE OF DATA FISH1

Time Whiting Cod
12:00 1 1
12:01 NA NA
12:02 2 NA
12:03 NA NA

I have managed to subset the data and produce a cumulative sum:

whiting<-subset(fish1,fish1$Whiting>0)
whiting$run.whit<-cumsum(whiting$Whiting)

cod<-subset(fish1,fish1$Cod>0)
cod$run.cod<-cumsum(cod$Cod)

I realise that the above code subsets the outputs separately which overrun each other on fish1.

Additionally, each species has multiple NA values so each variable is of different length and cannot be read by the model.

lm(time ~ species1, species2 etc)

I have struggled to subset the species together in a new dataframe due to the same problem (different number of rows)

Additionally as the species count is recorded against time, I am unsure whether the NA's within each species need to be accounted for or omitted within the use of cumsum(). Would this also effect my linear model output?

When the NAs are accounted for they output a variable of only NAs:

fish1$run.cod <- cumsum(ifelse(fish1$Cod > 0, fish1$Cod, NA))

Any help on this would be greatly appreciated!

1

There are 1 best solutions below

0
Adriano Mello On

Here´s a tidyverse take:

library(tidyverse)

#
fish1 <- as_tibble(fish1)

> fish1
# A tibble: 4 × 3
Time  Whiting   Cod
<chr>   <int> <int>
12:00       1     1
12:01      NA    NA
12:02       2    NA
12:03      NA    NA

#
fish1 <- mutate(fish1, across(c(Whiting, Cod), \(x) if_else(is.na(x), 0, x)))
fish1 <- mutate(fish1, across(c(Whiting, Cod), \(x) cumsum(x)))

> fish1
# A tibble: 4 × 3
Time  Whiting   Cod
<chr>   <dbl> <dbl>
12:00       1     1
12:01       1     1
12:02       3     1
12:03       3     1