How to replace any NAs in dataframe in R by the last 4 values

67 Views Asked by At

I have a data frame in R with following structure below. I want to fill in the missing values by carrying forward the most recent non-missing value, but I want to look back 4 rows and use that value to fill the gap. I may not be clear with the question but you may look below in the example of the result.

   maker  num
1  local_1  1
2  local_2  2
3  local_3  5
4  local_4  6
5  local_1  NA
6  local_2  NA
7  local_3  NA
8  local_4  NA
9  local_1  4
10 local_2  7
11 local_3  8
12 local_4  9
13 local_1  NA
14 local_2  NA
15 local_3  NA
16 local_4  NA
17 local_1  NA
18 local_2  NA
19 local_3  NA
20 local_4  NA

to be converted into:

   maker  num
1  local_1  1
2  local_2  2
3  local_3  5
4  local_4  6
5  local_1  1
6  local_2  2
7  local_3  5
8  local_4  6
9  local_1  4
10 local_2  7
11 local_3  8
12 local_4  9
13 local_1  4
14 local_2  7
15 local_3  8
16 local_4  9
17 local_1  4
18 local_2  7
19 local_3  8
20 local_4  9

The number of NA are always divisible by 4, but doesn't have a pattern of number of consecutive NAs.

3

There are 3 best solutions below

1
jared_mamrot On BEST ANSWER

You could use a for-loop, e.g.

df <- read.table(text = "maker  num
local_1  1
local_2  2
local_3  5
local_4  6
local_1  NA
local_2  NA
local_3  NA
local_4  NA
local_1  4
local_2  7
local_3  8
local_4  9
local_1  NA
local_2  NA
local_3  NA
local_4  NA
local_1  NA
local_2  NA
local_3  NA
local_4  NA", header = TRUE)

for(i in 1:nrow(df)) {
    if (is.na(df$num[i])){
    df$num[i] <- df$num[i - 4]
  }
}
df
#>      maker num
#> 1  local_1   1
#> 2  local_2   2
#> 3  local_3   5
#> 4  local_4   6
#> 5  local_1   1
#> 6  local_2   2
#> 7  local_3   5
#> 8  local_4   6
#> 9  local_1   4
#> 10 local_2   7
#> 11 local_3   8
#> 12 local_4   9
#> 13 local_1   4
#> 14 local_2   7
#> 15 local_3   8
#> 16 local_4   9
#> 17 local_1   4
#> 18 local_2   7
#> 19 local_3   8
#> 20 local_4   9

Created on 2024-03-12 with reprex v2.1.0

0
Onyambu On
f %>%
   group_by(new_num = (consecutive_id(is.na(num)) - 1) %/%2) %>%
   mutate(new_num = replace(num, is.na(num), na.omit(num)))

# A tibble: 20 × 3
# Groups:   new_num [8]
   maker     num new_num
   <chr>   <int>   <int>
 1 local_1     1       1
 2 local_2     2       2
 3 local_3     5       5
 4 local_4     6       6
 5 local_1    NA       1
 6 local_2    NA       2
 7 local_3    NA       5
 8 local_4    NA       6
 9 local_1     4       4
10 local_2     7       7
11 local_3     8       8
12 local_4     9       9
13 local_1    NA       4
14 local_2    NA       7
15 local_3    NA       8
16 local_4    NA       9
17 local_1    NA       4
18 local_2    NA       7
19 local_3    NA       8
20 local_4    NA       9
0
chan1142 On

We can also use zoo::na.locf: Convert df$num to an (n/4)*4 matrix, fill NA with the values in the previous row (na.locf), and then convert the matrix back to a vector.

df$num <- apply(matrix(df$num,ncol=4,byrow=T), 2, zoo::na.locf) |>
          t() |> as.vector()

This requires that 4 divides nrow(df).