I have a function to calculate the cumulative sum of the row, restarting the process every time after the Y3 in the column name.
There should be a sum consisting of Y1 in the first cell, Y1+Y2 in the second cell, Y1+Y2+Y3 in the third cell. Then the code restarts - starting from Y1.
All the column names have a pattern - they start from Y1, Y2, or Y3.
I would like to jump over the cells that contain zeros and continue calculating. That is, if the cell equals zero, it should stay as it is, instead of being overwritten by the cumulative. The code would skip a 0 cell and add the value of the current cell to the value of the previous, non-zero one.
For instance, if Y2=0, the sum in the Y3 cell should be Y3=Y3 (+Y2)+Y1, as usual, while cell Y2=0 remains unchanged instead of being overwritten by the cumulative Y2=Y2+Y1.
I wrote a function (see below), but there are two issues with the Rating 3C line.
Here is the reproducible code chunk:
df=structure(list(Ratingstufe = c("10", "1A", "1B", "2A", "2B",
"3A", "3C", "9C"),
Y1.Base = c(0, 0, 0, 0, 0, 0, 0, 1),
Y2.Base = c(0, 0, 0, 0, 0, 0, 0, -1),
Y3.Base = c(0, 0, 0, 0, 0, 0, 0, 0),
Y1.Scenario_1 = c(0, 0, 2, 2, 2, 2, 2, 1),
Y2.Scenario_1 = c(0, 0, 0, 0, 0, 0, 0, -1),
Y3.Scenario_1 = c(0, 0, 0, 0, 0, 0, -1, 0),
# Y1.Scenario_2 = c(0, 0, 1, 1, 1, 1, 1, 1),
# Y2.Scenario_2 = c(0, 1, 1, 0, 1, 0, 1, 0),
# Y3.Scenario_2 = c(0, 0, 0, 0, 0, 0, -1, 0),
Y1.Scenario_3 = c(0, 0, 1, 1, 0, 1, 1, 1),
Y2.Scenario_3 = c(0, 0, 1, 1, 1, 2, 2, -1),
Y3.Scenario_3 = c(0, 0, 0, 0, 0, 0, 1, 1)),
row.names = c(NA, 8L), class = "data.frame")
This is what I tried.
# Function to calculate cumulative sum by row, ignoring zeros and restarting at "Y1"
cumsum_ignore_zeros <- function(row) {
# Initialize variables
cumulative <- 0
result <- numeric(length(row))
# Iterate through the row elements
for (i in seq_along(row)) {
if (row[i] != 0) {
cumulative <- cumulative + row[i]
result[i] <- cumulative
} else {
result[i] <- row[i]
}
# Check if the column name contains 'Y1' and reset cumulative sum if it does
if (grepl("Y1", names(df)[i])) {
cumulative <- 0
}
}
return(result)
}
# Apply the function to each row
df[-1] <- t(apply(df[-1], 1, cumsum_ignore_zeros))
Problem 1: column Y3.Scenario_1. There is an incorrectly calculated -1 at the intersection with the Rating 3C. I feel it's because in the preceding cell (Y2.Scenario_1), there is a result of 0, and the function does not add Y3 to Y1, but simply writes Y3 result down. Instead of jumping over Y2, and then doing Y3=Y3+Y1, it seemingly restarts, going Y3=Y3 instead.
Once 1 was added to -1 (Y2=Y2+Y1=1+(-1)=0) and replaced the cell on the intersection of Rating 3C and Y2.Scenario_1 with 0, the function ignored 0 in Y2 and restarted? And showed a Y3=-1, seemingly a result of summing the Y3=Y3 (the cell with itself), instead of summing Y3=Y3+Y1.
Problem 2: The next case where this issue is manifesting itself is when somehow, I believe, Y1.Scenario_3 adds the value of Y3.Scenario_1 to itself, when it is not supposed to. Such that Y1.Scenario_3=Y1.Scenario_3+Y3.Scenario_1. It should have been just Y1.Scenario_3=Y1.Scenario_3.
Instead, 1 adds -1 to itself in the on the intersection of Rating 3C and Y1.Scenario_3, becoming 0.
Y1.Scenario_3 was supposed to not look back to Y3.Scenario_1. I do not see any other option as to why the result would be like this.
Here is the outcome I have currently.
tail(df)
Ratingstufe Y1.Base Y2.Base Y3.Base Y1.Scenario_1 Y2.Scenario_1 Y3.Scenario_1 Y1.Scenario_3 Y2.Scenario_3 Y3.Scenario_3
3 1B 0 0 0 2 0 0 1 2 0
4 2A 0 0 0 2 0 0 1 2 0
5 2B 0 0 0 2 0 0 0 1 0
6 3A 0 0 0 2 0 0 1 3 0
7 3C 0 0 0 2 0 -1 0 2 1
8 9C 1 0 0 1 0 0 1 0 1
And this is what I would like to achieve (I marked the changes by the stars):
tail(df)
Ratingstufe Y1.Base Y2.Base Y3.Base Y1.Scenario_1 Y2.Scenario_1 Y3.Scenario_1 Y1.Scenario_3 Y2.Scenario_3 Y3.Scenario_3
3 1B 0 0 0 2 0 0 1 2 0
4 2A 0 0 0 2 0 0 1 2 0
5 2B 0 0 0 2 0 0 0 1 0
6 3A 0 0 0 2 0 0 1 3 0
7 3C 0 0 0 2 0 1* 1* 3* 4*
8 9C 1 0 0 1 0 0 1 0 1
My original dataset for your convenience (dput can be found above):
tail(df)
Ratingstufe Y1.Base Y2.Base Y3.Base Y1.Scenario_1 Y2.Scenario_1 Y3.Scenario_1 Y1.Scenario_3 Y2.Scenario_3 Y3.Scenario_3
3 1B 0 0 0 2 0 0 1 1 0
4 2A 0 0 0 2 0 0 1 1 0
5 2B 0 0 0 2 0 0 0 1 0
6 3A 0 0 0 2 0 0 1 2 0
7 3C 0 0 0 2 0 -1 1 2 1
8 9C 1 -1 0 1 -1 0 1 -1 1
Any other solution besides correcting the loop is also welcome.
We can convert the data to long format and get the
cumsum(with an additional condition) for each "scenario" and "rating" (assuming that ratings are unique), and then convert back to wide format.Created on 2024-02-29 with reprex v2.0.2