Count NA Values by Group (Year and month) in several columns in R

Question

Count NA Values by Group (Year and month) in several columns in R

53 Views Asked by Marcel At 08 March 2024 at 13:07

I would like to count the number of missing values (NA) in each month and year of the time series (columns) and plot a bar chart. How to extract this information from a data frame? I need as result a table with the number of missing values (NA) per month each year, for each columns.

b <- read.table(text = '        Date  AAA  BBB  CCC  DDD  EEE
49 1999-12-15 24.8 21.4 25.6 35.0 17.4
50 1999-12-16   NA  0.6  1.5  6.3  2.5
51 1999-12-17   NA 16.3 20.3  NA 19.2
52 1999-12-18   13  1.6 NA  6.3  0.0
53 1999-12-19   10 36.4 12.5 26.8 24.9
54 1999-12-20   NA  0.0  0.0  0.2  0.0
55 1999-12-21  0.2  0.0  0.0  0.0  0.0
56 1999-12-22  0.0  0.0  0.0  0.0  0.0')

head(b)

Thank you

Original Q&A

There are 3 best solutions below

**jpsmith** · Answer 1 · 2024-03-08T13:26:28.633000

You ask two questions, but to address the one in your title, you can count the NA values across columns AAA through EEE by year and month in dplyr (note the data are slightly changed to include multiple year-month groups):

b %>%
  mutate(ym = format(as.Date(Date), "%Y-%m")) %>%
  summarise(NA_count = sum(across(AAA:EEE, ~sum(is.na(.x)))),
            .by = ym)

#        ym NA_count
# 1 1999-01        3
# 2 1999-02        2
# 3 1999-03        0

To create a barplot of this, there are several ways. One way in base R is (note, I saved the above data as plotdat):

barplot(plotdat$NA_count, 
        names.arg = plotdat$ym)

Note I changed your sample data to have multiple year-month groups:

b <- read.table(text = '        Date  AAA  BBB  CCC  DDD  EEE
49 1999-1-15 24.8 21.4 25.6 35.0 17.4
50 1999-1-16   NA  0.6  1.5  6.3  2.5
51 1999-1-17   NA 16.3 20.3  NA 19.2
52 1999-2-18   13  1.6 NA  6.3  0.0
53 1999-2-19   10 36.4 12.5 26.8 24.9
54 1999-2-20   NA  0.0  0.0  0.2  0.0
55 1999-3-21  0.2  0.0  0.0  0.0  0.0
56 1999-3-22  0.0  0.0  0.0  0.0  0.0')

**Friede** · Answer 2 · 2024-03-08T13:52:48.213000

Based on the kind data provision given in @jpsmith's answer, I would like to add two approaches:

1. Base R

Use by; very concise:

b$Date = format(as.Date(b$Date), "%Y-%m")
by(b, b$Date, \(x) sum(is.na(x))) |> barplot()

or,

2. `{collapse}`

for really large data, collapse::BY + rowSums might be faster

library(collapse)
BY(b, b$Date, \(x) sum(is.na(x))) |> rowSums() |> barplot()

**jay.sf** · Answer 3 · 2024-03-08T16:11:23.593000

Suggesting to use tapply and mean,

> tapply(b[-1], strftime(b$Date, '%Y-%m'), is.na) |> sapply(mean) |> 
+   barplot(ylab='p. NA', las=1)

or colMeans.

> tapply(b[-1], strftime(b$Date, '%Y-%m'), is.na) |> sapply(colMeans) |> 
+   barplot(beside=TRUE, ylab='p. NA', leg=names(b[-1]), col=hcl.colors(ncol(b) - 1), las=1)

Edit

I think it was already answered, but using tapply such a table you can get like so:

> tapply(b[-1], strftime(b$Date, '%Y-%m'), is.na) |> sapply(colSums)
    1999-01 1999-02 1999-03
AAA       2       1       0
BBB       0       0       0
CCC       0       1       0
DDD       1       0       0
EEE       0       0       0

Data:

Thanks to @jpsmith

> dput(b)
structure(list(Date = c("1999-1-15", "1999-1-16", "1999-1-17", 
"1999-2-18", "1999-2-19", "1999-2-20", "1999-3-21", "1999-3-22"
), AAA = c(24.8, NA, NA, 13, 10, NA, 0.2, 0), BBB = c(21.4, 0.6, 
16.3, 1.6, 36.4, 0, 0, 0), CCC = c(25.6, 1.5, 20.3, NA, 12.5, 
0, 0, 0), DDD = c(35, 6.3, NA, 6.3, 26.8, 0.2, 0, 0), EEE = c(17.4, 
2.5, 19.2, 0, 24.9, 0, 0, 0)), class = "data.frame", row.names = c("49", 
"50", "51", "52", "53", "54", "55", "56"))

Count NA Values by Group (Year and month) in several columns in R

There are 3 best solutions below

1. Base R

2. `{collapse}`

Edit

Related Questions in R

Related Questions in DATETIME

Related Questions in BAR-CHART

Related Questions in MISSING-DATA

Trending Questions

Popular # Hahtags

Popular Questions

Count NA Values by Group (Year and month) in several columns in R

There are 3 best solutions below

1. Base R

2. {collapse}

Edit

Related Questions in R

Related Questions in DATETIME

Related Questions in BAR-CHART

Related Questions in MISSING-DATA

Trending Questions

Popular # Hahtags

Popular Questions

2. `{collapse}`