R Tidyverse Group by and count of nulls for all columns

Question

R Tidyverse Group by and count of nulls for all columns

309 Views Asked by Gingie At 06 June 2025 at 16:53

I found similar questions but can't quite find one that answers my question: I am trying to find, by location, how often people are entering data for ~ 100 parameters. This question almost answers it. But how do I add a group_by line, so it is more granular? Tidyverse solution for Counting nulls from multiple fields

Trying to get to something similar to:

the column I am trying to group_by is group_id. This is my current, not elegant solution

df1 <- structure(list(SCR9 = c(50.5, NA, NA, 25.75, 100, NA, NA, NA, 
                        100, NA, NA, NA, 75.25, NA, NA), SCR10 = c(25.75, NA, NA, NA, 
                                                                   NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA), SCR12 = c(75.25, 
                                                                                                                         75.25, 50.5, NA, 75.25, 75.25, 100, 100, 75.25, NA, 75.25, 75.25, 
                                                                                                                         50.5, 100, 50.5), ID = 1:15, group_id = c("a", "b", "b", "c", 
                                                                                                                                                                   "a", "b", "c", "c", "a", "b", "a", "a", "c", "b", "b")), row.names = c(NA, 
                                                                                                                                                                                                                                          15L), class = "data.frame")
attach(df1)

df2 <- df1 |> 
  split.data.frame(group_id)

d_a <- df2$a |> 
  map_df(function(x) sum(is.na(x))) %>%
  gather(feature, num_nulls) %>%
  dplyr::arrange(desc(num_nulls)) %>%
  mutate(
    percent_null = num_nulls/nrow(df2$a),
    group_id = 'a') |> 
  select(-num_nulls)


d_b <- df2$b |> 
  map_df(function(x) sum(is.na(x))) %>%
  gather(feature, num_nulls) %>%
  dplyr::arrange(desc(num_nulls)) %>%
  mutate(
    percent_null = num_nulls/nrow(df2$b),
    group_id = 'b') |> 
  select(-num_nulls)

d_c <- df2$c |> 
  map_df(function(x) sum(is.na(x))) %>%
  gather(feature, num_nulls) %>%
  dplyr::arrange(desc(num_nulls)) %>%
  mutate(
    percent_null = num_nulls/nrow(df2$c),
    group_id = 'c') |> 
  select(-num_nulls)

d_all <- bind_rows(
  d_a,
  d_b,
  d_c
)

d_all |> 
  dplyr::arrange(feature, group_id) |> 
  slice(-c(1:3)) |> 
  select(feature, group_id, percent_null)

Thanks for the help!

Original Q&A

There are 1 best solutions below

**Carl** · Accepted Answer

See if this works for you ...

library(tidyverse)

df1 <- structure(list(SCR9 = c(
  50.5, NA, NA, 25.75, 100, NA, NA, NA,
  100, NA, NA, NA, 75.25, NA, NA
), SCR10 = c(
  25.75, NA, NA, NA,
  NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA
), SCR12 = c(
  75.25,
  75.25, 50.5, NA, 75.25, 75.25, 100, 100, 75.25, NA, 75.25, 75.25,
  50.5, 100, 50.5
), ID = 1:15, group_id = c(
  "a", "b", "b", "c",
  "a", "b", "c", "c", "a", "b", "a", "a", "c", "b", "b"
)), row.names = c(
  NA,
  15L
), class = "data.frame")
              
df1 |>
  group_by(group_id) |>
  summarise(pct_na = across(everything(), ~ sum(is.na(.x)) / n()))
#> # A tibble: 3 × 2
#>   group_id pct_na$SCR9 $SCR10 $SCR12   $ID
#>   <chr>          <dbl>  <dbl>  <dbl> <dbl>
#> 1 a                0.4  0.8    0         0
#> 2 b                1    0.833  0.167     0
#> 3 c                0.5  1      0.25      0

^{Created on 2022-04-21 by the reprex package (v2.0.1)}

R Tidyverse Group by and count of nulls for all columns

There are 1 best solutions below

Related Questions in R

Related Questions in TIDYVERSE

Related Questions in EXPLORATORY-DATA-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions