I have a relatively large dataset and I want to replace NA value for the price in a specific year and for a specific ID number with an available value in next year within a group for the same ID number. Here is a reproducible example:
ID <- c(1,2,3,2,2,3,1,4,5,5,1,2,2)
year <- c(2000,2001,2002,2002,2003,2007,2001,2000,2005,2006,2002,2004,2005)
value <- c(1000,20000,30000,NA,40000,NA,6000,4000,NA,20000,7000,50000,60000)
data <- data.frame(ID, year, value)
ID year value
1 1 2000 1000
2 2 2001 20000
3 3 2002 30000
4 2 2002 NA
5 2 2003 40000
6 3 2007 NA
7 1 2001 6000
8 4 2000 4000
9 5 2005 NA
10 5 2006 20000
11 1 2002 7000
12 2 2004 50000
13 2 2005 60000
So, for example for ID=2 we have following value and years:
ID year value
2 2001 20000
2 2002 NA
2 2003 40000
2 2004 50000
2 2005 60000
So in the above case, NA should be replaced with 40000 (Values in next year). And the same story for other IDs. the final result should be in this form:
ID year value
1 2000 1000
1 2001 6000
1 2002 7000
2 2001 20000
2 2002 40000
2 2003 40000
2 2004 50000
2 2005 60000
3 2007 NA
4 2000 4000
5 2005 20000
5 2006 20000
Please note that for ID=3 since there is no next year available, we want to leave it as is. That's why it's in the form of NA
I appreciate if you can suggest a solution
Thanks
dplyr
solution