I have a dataframe that has a scattering of NA's
toy_df
# Y X1 X2 Label
# 5 3 3 A
# 3 NA 2 B
# 3 NA NA C
# 2 NA 6 B
I want to group this by the label field, and count how many non NA values are in each variable for each label.
desired output:
# Label Y X1 X2
# A 1 1 1
# B 2 0 2
# C 1 0 0
I've done this using loops at the moment, but it's slow and untidy and I'm sure there's a better way.
Aggregate seems to get half way there, but it includes NA's in the count.
aggregate(toy_df, list(toy_df$label), FUN=length)
Any ideas appreciated...
We can use
data.table
. Convert the 'data.frame' to 'data.table' (setDT(toy_df)
), grouped by 'Label', loop through the Subset of Data.table (.SD
) and get thesum
of non-NA values (!is.na(x)
)Or with
dplyr
using the same methodologyOr a
base R
option withby
andcolSums
grouped by the 4th column on logical matrix (!is.na(toy_df[-4])
)Or with
rowsum
with similar approach as inby
except using therowsum
function.