R plyr application to fill in missing values

224 Views Asked by At

I have a dataframe with person-year observations on a number of variables. It looks like this:

   year     serial moved urban.rural.code   
15 1982     1000_1     0                0
16 1983     1000_1     0                0
17 1984     1000_1     0                0
18 1985     1000_1     1                0
19 1986     1000_1     1                1
20 1981     1000_2     0                1
21 1982     1000_2     0                1
22 1983     1000_2     0                1
23 1984     1000_2     0                0
24 1985     1000_2     0                9   
25 1996     1000_2     0                1
26 1993     1000_3     0                1
27 1994     1000_3     0                1
28 1984     1000_4     0                0
29 1985     1000_4     0                7  
30 1987     1000_5     0                1
31 1984     1000_6     0                0
32 1999     1000_6     0                8

For every observation WITHIN a serial number, if the observation was recorded in year 1985 and has a value of moved = 0 in 1895, then I want to assign the urban.rural.code in year 1984 to its value in 1985. In the above example, the urban.rural.code ONLY for rows 23 and 28 should be assigned to 9 and 7 respectively.

I've used a combination of ddply and a helper function, which looks like this:

fill1984 <- function(group) {
    if((1984 %in% group$year) & (group[group$year == 1985, 'moved'] == 0)) {
        group[group$year == 1984, 'urban.rural.code'] <- group[group$year == 1985,     'urban.rural.code']
        } 
     return(group)
 }

data <- ddply(data, 'serial', fill1984, .parallel=TRUE)     

And I get the following error:

Error in do.ply(i) : task 2 failed - "argument is of length zero"
In addition: Warning message:
In setup_parallel() : No parallel backend registered

I don't know where I'm going wrong. How do I make the edits to urban.rural.code within each serial number group?

1

There are 1 best solutions below

0
On BEST ANSWER

This is in dplyr and may be able to be cleaned up some, but it looks like it works:

library(dplyr)
newdf <- data %>%
          group_by(serial) %>%
          mutate(
            cidx = year == 1985 & moved == 0,
            urban.rural.code = ifelse(year == 1984 & isTRUE(cidx[year==1985]),
                                      urban.rural.code[year == 1985],
                                      urban.rural.code)
          )