I found similar questions but can't quite find one that answers my question: I am trying to find, by location, how often people are entering data for ~ 100 parameters. This question almost answers it. But how do I add a group_by line, so it is more granular? Tidyverse solution for Counting nulls from multiple fields
Trying to get to something similar to:
the column I am trying to group_by is group_id. This is my current, not elegant solution
df1 <- structure(list(SCR9 = c(50.5, NA, NA, 25.75, 100, NA, NA, NA,
100, NA, NA, NA, 75.25, NA, NA), SCR10 = c(25.75, NA, NA, NA,
NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA), SCR12 = c(75.25,
75.25, 50.5, NA, 75.25, 75.25, 100, 100, 75.25, NA, 75.25, 75.25,
50.5, 100, 50.5), ID = 1:15, group_id = c("a", "b", "b", "c",
"a", "b", "c", "c", "a", "b", "a", "a", "c", "b", "b")), row.names = c(NA,
15L), class = "data.frame")
attach(df1)
df2 <- df1 |>
split.data.frame(group_id)
d_a <- df2$a |>
map_df(function(x) sum(is.na(x))) %>%
gather(feature, num_nulls) %>%
dplyr::arrange(desc(num_nulls)) %>%
mutate(
percent_null = num_nulls/nrow(df2$a),
group_id = 'a') |>
select(-num_nulls)
d_b <- df2$b |>
map_df(function(x) sum(is.na(x))) %>%
gather(feature, num_nulls) %>%
dplyr::arrange(desc(num_nulls)) %>%
mutate(
percent_null = num_nulls/nrow(df2$b),
group_id = 'b') |>
select(-num_nulls)
d_c <- df2$c |>
map_df(function(x) sum(is.na(x))) %>%
gather(feature, num_nulls) %>%
dplyr::arrange(desc(num_nulls)) %>%
mutate(
percent_null = num_nulls/nrow(df2$c),
group_id = 'c') |>
select(-num_nulls)
d_all <- bind_rows(
d_a,
d_b,
d_c
)
d_all |>
dplyr::arrange(feature, group_id) |>
slice(-c(1:3)) |>
select(feature, group_id, percent_null)
Thanks for the help!
See if this works for you ...
Created on 2022-04-21 by the reprex package (v2.0.1)