Identify all combinations of six variables in R

1.9k Views Asked by At

I have a data frame with 6 variables and 250 observations that looks as follows:

   id    Var1    Var2    Var3    Var4    Var5    Var6 **

   1     yes     yes     yes     no      yes     yes
   2     no      no      yes     yes     no      yes
   ...
   250   no      yes     yes     yes     yes     yes

I want to identify all combinations of variables present in the data. For example, I know there are 20 observations with "yes" for each variable.

I am doing a peer grouping analysis and want to group the observations based on these yes/no variables. The 20 observations with "yes" to each variable will be group#1, 20 other observations have Var1=yes and Var2:Var6=no will be group#2, etc...

I attempted to use count in plyr as follows:

> count(dataframe[,-1])

This did not work. Any suggestions will be great!

3

There are 3 best solutions below

2
On BEST ANSWER

You can either use interaction or paste( ..., sep="_") to make the combinations, but then you need to do something with them. Either split them into separate categories (which will preserve identities) or tabulate them with table (or both).

 int_grps <- split( dataframe[,1], interaction( dataframe[,-1], drop=TRUE) )

 int_counts <- table( interaction( dataframe[,-1], drop=TRUE ) )

If you only wanted to enumerate the combinations that exist, the code could be:

names(table(interaction( dataframe[,-1], drop=TRUE)) )    
0
On

I would use the group_by() function in dplyr to group the data by Var1, Var2, ..., Var6. You can then use summarise() to find the number of times each combination occurs.

library(dplyr)

df <- read.table(text = 
"id    Var1    Var2    Var3    Var4    Var5    Var6
   1     yes     yes     yes     no      yes     yes
   2     no      no      yes     yes     no      yes
   3     no      no      yes     yes     no      yes
   250   no      yes     yes     yes     yes     yes
", header = TRUE, stringsAsFactors = FALSE)

df %>%
  group_by(Var1, Var2, Var3, Var4, Var5, Var6) %>%
  summarise(n_occur = n())
2
On

You are looking for interaction here.

with (yourdata, interaction (Var1, Var2, Var3, Var4,Var5, Var6 ))

Or, as suggested by @thelatemail:

do.call(interaction,c(yourdata[-1],drop=TRUE))