I have a following code:
for(i in 1:length(hh_temp)){
hh_temp_save = hh_temp[[i]]
for(j in 4:nrow(hh_temp_save)){
hh_temp_save$max_min_sum_5days[j] = ifelse(sum(hh_temp_save$max_min_sum[(j-4):j])>2,1,0)
hh_temp[[i]] = hh_temp_save
}
}
where hh_temp is a list with length(hh_temp) = 12, each element in hh_temp is a dataframe.
I tried to convert the for-loop into nested apply but I found that
lapply(hh_temp,\(x){
x = lapply(32:nrow(x),\(y){
x$max_min_sum_5days[y] = ifelse(sum(x$max_min_sum[(y-4):y])>2,1,0)
x
})
return(x)
})
I can only return the manipulated vector instead of the whole dataset. Is there any way to return the whole dataset? Does it mean nested lapply is not suitable for manipulating single element in a vector?
I am sorry that I cannot provide the detail of dataset, some descriptive statistics can be provided:
> str(hh_temp)
List of 12
$ : tibble [3,684 × 36] (S3: tbl_df/tbl/data.frame)
..$ max_min_sum : num [1:3684] 0 0 0 0 0 0 0 0 0 0 ...
..$ max_min_sum_5days : num [1:3684] NA NA NA NA NA NA NA NA NA NA ...
$ : tibble [3,684 × 36] (S3: tbl_df/tbl/data.frame)
..$ max_min_sum : num [1:3684] 0 0 0 0 0 0 0 0 0 0 ...
..$ max_min_sum_5days : num [1:3684] NA NA NA NA NA NA NA NA NA NA ...
#repeated for 12 times
#max_min_sum is a binary variable
Sample data:
df = data.frame(a = as.factor(c(1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,0,1)),
b = rep(NA,18))
sample_list = list(df,df,df,df,df,df)
My expected outcome is to calculate the cumulative sum of the 5 consecutive elements in a and then if the consecutive sum is greater than 2, the corresponding element in b will be recoded as 1, otherwise 0.
| a | b |
|---|---|
| 1 | NA |
| 1 | NA |
| 1 | NA |
| 1 | NA |
| 0 | 1 |
| 0 | 1 |
| 0 | 0 |
In the 5th element in a,
since there are 4 1s and 1 0, therefore, the consecutive sum is greater than 2, the corresponding element in b will be recoded as 1.
If
ais a factor variable, we need to runas.numeric(as.character(a))beforehand to coerceato numeric. We can userollsum()from{zoo}for the rolling sum calulation.A solution using
lapply()applied to slightly modified sample data.Code
Or in a more compact way as suggested by @G. Grothendieck like
Result
Modified Data
Created on 2023-12-08 with reprex v2.0.2
Edit
If your data is rather small and you do not want to rely on an external package like
{zoo}, you might consider to write your own rolling sum function. Very basic example: