Details In an effort to run a linear model, I need to produce a cumulative sum of the count of different species. Each species is recorded as individual count over time on the spreadsheet 'fish1'.
EXAMPLE OF DATA FISH1
| Time | Whiting | Cod |
|---|---|---|
| 12:00 | 1 | 1 |
| 12:01 | NA | NA |
| 12:02 | 2 | NA |
| 12:03 | NA | NA |
I have managed to subset the data and produce a cumulative sum:
whiting<-subset(fish1,fish1$Whiting>0)
whiting$run.whit<-cumsum(whiting$Whiting)
cod<-subset(fish1,fish1$Cod>0)
cod$run.cod<-cumsum(cod$Cod)
I realise that the above code subsets the outputs separately which overrun each other on fish1.
Additionally, each species has multiple NA values so each variable is of different length and cannot be read by the model.
lm(time ~ species1, species2 etc)
I have struggled to subset the species together in a new dataframe due to the same problem (different number of rows)
Additionally as the species count is recorded against time, I am unsure whether the NA's within each species need to be accounted for or omitted within the use of cumsum(). Would this also effect my linear model output?
When the NAs are accounted for they output a variable of only NAs:
fish1$run.cod <- cumsum(ifelse(fish1$Cod > 0, fish1$Cod, NA))
Any help on this would be greatly appreciated!
Here´s a
tidyversetake: