Changing NA Values based on cell values in same column in R

77 Views Asked by At
V1 <- c("Name", "Paul", "Name", "Sarah", NA, NA, NA, NA, "Name", "Carl", NA, NA, "Name", "Alice", "Name", "Rita")
V2 <- c("Name", "Paul", "Name", "Sarah", "Name", "Sarah", "Name", "Sarah", "Name", "Carl", "Name", "Carl", "Name", "Alice", "Name", "Rita")
df <- data.frame(V1, V2)
df

I would like V1 to look like V2. EDIT: In the original dataset, V2 doesnt exist, I created it here to give some example data.

      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  <NA>  Name
12  <NA>  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita 

I tried the following:

#find the positions of missings in V1 
m <- which(is.na(df$V1) == TRUE)
m
[1]  5  6  7  8 11 12

#go to every position and change the value depending on the field that is 2 field above the missing
for (i in m) {
  df$V1[m[i]] <- df$V1[m[i]-2]
}

The output is working, but its faulty:

      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  Name  Name
12  Carl  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita

Why is it working for the other cells but not the first incident? Also, I'm trying to avoid for loops, so if there is a more elegant way to do it, I would love to see one!

3

There are 3 best solutions below

0
On

One option involving dplyr and tidyr could be:

df %>%
 fill(V1) %>%
 group_by(rleid = with(rle(V1), rep(seq_along(lengths), lengths))) %>%
 mutate(V1 = ifelse(row_number() %% 2 == 0 , "Name", V1)) %>%
 ungroup() %>%
 select(-rleid)

   V1    V2   
   <chr> <chr>
 1 Name  Name 
 2 Paul  Paul 
 3 Name  Name 
 4 Sarah Sarah
 5 Name  Name 
 6 Sarah Sarah
 7 Name  Name 
 8 Sarah Sarah
 9 Name  Name 
10 Carl  Carl 
11 Name  Name 
12 Carl  Carl 
13 Name  Name 
14 Alice Alice
15 Name  Name 
16 Rita  Rita 
0
On

Since your for loop is looping over m, you could directly do

m <- which(is.na(df$V1))
for (i in m) df$V1[i] <- df$V1[i-2]
df

#      V1    V2
#1   Name  Name
#2   Paul  Paul
#3   Name  Name
#4  Sarah Sarah
#5   Name  Name
#6  Sarah Sarah
#7   Name  Name
#8  Sarah Sarah
#9   Name  Name
#10  Carl  Carl
#11  Name  Name
#12  Carl  Carl
#13  Name  Name
#14 Alice Alice
#15  Name  Name
#16  Rita  Rita
0
On

Here is a base R solution, where you use matrix to reformulate the problem:

df$V2 <- as.vector(t(apply(matrix(df$V1,nrow = 2), 1, function(x) x[!is.na(x)][cumsum(!is.na(x))])))

such that

> df
      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  <NA>  Name
12  <NA>  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita