r - lapply, ifelse and trying to compare row vs previous row in one column

2.5k Views Asked by At

Forgive me in advance for trying to use my excel logic in R, but I can't seem to figure this out. In a function, given X I am trying to find out if the row prior to it has a greater value or not using simple logic. If it is, show in the new column as "yes" if not "no".

Here is the sample data:

temp <- data
GetFUNC<- function(x){
         temp <- cbind(temp, NewCol = ifelse(temp[2:nrow(temp),8] > temp[1:(nrow(temp)-1),8], "yes","no"))
         write.csv(temp, file = paste0(x,".csv"))
}
lapply(example,GetFUNC)

Just so you can see column 8 it looks like this:

testdata$numbers
 [1] 32216510 10755328  8083097  6878500  8377025  6469979 10675856  8189887  5337239
[10]  5156737

The error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 11, 10

Thanks for any insight you can provide!

2

There are 2 best solutions below

0
On

Here's a dplyr solution using lag to look at the previous row and mutate to add the new column.

library(dplyr)
df1 <- data.frame(numbers = c(32216510, 10755328, 8083097, 6878500, 8377025,
                               6469979, 10675856, 8189887, 5337239, 5156737))

df1 %>% 
  mutate(NewCol = ifelse(lag(numbers) > numbers, "yes", "no"))

    numbers NewCol
1  32216510   <NA>
2  10755328    yes
3   8083097    yes
4   6878500    yes
5   8377025     no
6   6469979    yes
7  10675856     no
8   8189887    yes
9   5337239    yes
10  5156737    yes
3
On

There are several problems:

  • You don't need lapply since all the operations you are using are already vectorized.
  • : binds more tightly than - (see ?Syntax) so 1:(nrow(temp)-1 means (1:(nrow(temp))-1. You want 1:(nrow(temp)-1) For example, compare these:

    3:5-1
    ## [1] 2 3 4
    
    (3:5) - 1   # same
    ## [1] 2 3 4
    
    3:(5-1)    # different
    ## [1] 3 4
    
  • even if the last one is corrected your ifelse expression returns a vector which is one smaller than the number of rows in testdata. Add on an NA at the beginning.

1) Even better would be this assuming the input data frame is testdata and defined as in the Note at the end:

transform(testdata, NewCol = c(NA, ifelse(diff(numbers) < 0, "yes", "no")))

giving:

    numbers NewCol
1  32216510   <NA>
2  10755328    yes
3   8083097    yes
4   6878500    yes
5   8377025     no
6   6469979    yes
7  10675856     no
8   8189887    yes
9   5337239    yes
10  5156737    yes

2) The above is likely what you want but here is a second solution using rollapplyr in the zoo package. It takes a rolling window of length 2 and performs a diff on each one filling the first value with NA.

library(zoo)

transform(testdata, New = ifelse(rollapplyr(numbers, 2, diff, fill = NA) < 0, "yes", "no"))

Note: The input testdata in reproducible form is:

testdata <- data.frame(numbers = c(32216510, 10755328, 8083097, 6878500, 
    8377025 , 6469979, 10675856, 8189887, 5337239, 5156737))