adding the frequency column in R without using dyplr

78 Views Asked by At

I have a "wide" dataset where for each observation I measure a value from a bunch of categorical variables. It is presented just like this:

V1 V2 V3
a z f
a z f
b y g
b y g
a y g
b y f

this means that V1 has two categories "a" and "b", V2 has two categories "z" and "y", and so on. But suppose that I have 30 variables (a quite bigger dataset).

I want to obtain a dataset in this form

V1 V2 V3 Freq
a z f 2
b y g 2
a y g 1
b y f 1

How can I get it in R? with smaller datasets I use transform(table(data.frame(data))) but it doesn't work with bigger datasets since it requires to build giant tables. Can somebody help please?

I would like to get a "general" code that does not depend on the variables name since I will be using it in a function. And moreover, since the datasets will be big I prefer to do it without the function table.

Thanks

2

There are 2 best solutions below

0
On

I assume your dataset dt contains only categorical variables and Freq represents the number of observations for each unique combination of the categorical variables.

As you want codes "without using dplyr," here is an alternative using data.table.

library(data.table)
dt[, Freq:=.N, by=c(colnames(dt))]
1
On

In base R, with interaction:

as.data.frame(table(interaction(df, sep = "", drop = TRUE)))

Or, with table:

subset(data.frame(table(df)), Freq > 0)

#  V1 V2 V3 Freq
#2  b  y  f    1
#3  a  z  f    2
#5  a  y  g    1
#6  b  y  g    2

With dplyr:

library(dplyr)
df %>% 
  count(V1, V2, V3, name = "Freq")

#  V1    V2    V3     Freq
#1 a     y     g         1
#2 a     z     f         2
#3 b     y     f         1
#4 b     y     g         2