R how do i add NA to a dataframe where column values are missing?

80 Views Asked by At

In the given dataframe i would like to allow NA when the values for the number of rows don't match.

Category = c("New" , "Expired", "Expired", "Expired" , "Old", "New")
Months = c ('Mar-2014','Jun-2014', 'Sep-2014','Dec-2014')
Vals = c(4,7,6,7)
df <- data.frame(Category, Months ,Vals) 

Error arguments imply differing number of rows: 6, 4

Outcome

  Category Quart_Months Quart_Vals
1      New     Mar-2014          4
2  Expired     Jun-2014          7
3  Expired     Sep-2014          6
4  Expired     Dec-2014          7
5  Expired     NA               NA
6  Old         NA               NA
7  New         NA               NA
3

There are 3 best solutions below

0
Friede On

Pretty sure it's a duplicate, but could not find. A verbose base R approach:

> lx = list(Category = c("New" , "Expired", "Expired", "Expired" , "Old", "New"), 
+           Months = c ('Mar-2014','Jun-2014', 'Sep-2014','Dec-2014'), Vals = c(4,7,6,7))
> f = \(col, fill = NA) {
+   l = lengths(col, use.names = FALSE); ml = max(l) 
+   if(length(unique(l)) != 1L) col = lapply(col, \(x) c(x, rep(fill,  ml - length(x)))) 
+   col |> do.call(what = cbind, args = _) |> as.data.frame(row.names = FALSE) 
+ }
> 
> f(lx)
  Category   Months Vals
1      New Mar-2014    4
2  Expired Jun-2014    7
3  Expired Sep-2014    6
4  Expired Dec-2014    7
5      Old     <NA> <NA>
6      New     <NA> <NA>

You can specify the fill argument with something different from NA if needed.

0
jpsmith On

Explicitly defining the length of a vector that is longer than the values within the vector will "pad" it with NA:

a <- 1
length(a) <- 4
a
# [1]  1 NA NA NA

So a base R approach would just be to make all the vectors the same length:

maxlen <- max(length(Category), length(Months), length(Vals))
length(Category) <- length(Months) <- length(Vals) <- maxlen                      

df <- data.frame(Category, Months ,Vals) 

#   Category   Months Vals
# 1      New Mar-2014    4
# 2  Expired Jun-2014    7
# 3  Expired Sep-2014    6
# 4  Expired Dec-2014    7
# 5      Old     <NA>   NA
# 6      New     <NA>   NA
0
thelatemail On

Similar to @jpsmith's answer, but I think using an intermediate list makes this more easily extendable:

L <- mget(c("Category", "Months", "Vals"))
data.frame(lapply(L, \(x) {length(x) <- max(lengths(L)); x}))
#### or if you want to be tricky but less readable:
## data.frame(lapply(L, `length<-`, max(lengths(L))))

#  Category   Months Vals
#1      New Mar-2014    4
#2  Expired Jun-2014    7
#3  Expired Sep-2014    6
#4  Expired Dec-2014    7
#5      Old     <NA>   NA
#6      New     <NA>   NA