Define all combinations of six variables in R

I have a data frame with 6 variables and 250 observations that looks like this:

   id    Var1    Var2    Var3    Var4    Var5    Var6 **

   1     yes     yes     yes     no      yes     yes
   2     no      no      yes     yes     no      yes
   ...
   250   no      yes     yes     yes     yes     yes

      

I want to identify all combinations of variables present in the data. For example, I know that there are 20 cases with yes for each variable.

I am doing a grouping analysis and want to group observations based on these yes / no variables. 20 cases with "yes" for each variable will be group # 1, 20 other cases with Var1 = yes and Var2: Var6 = no will be group # 2, and so on.

I tried to use count in plyr like this:

> count(dataframe[,-1])

      

It didn't work. Any suggestions would be great!

+3


source to share


3 answers


You can use interaction

or paste( ..., sep="_")

to create combinations, but then you need to do something with them. Either split

they are in separate categories (which will keep the IDs), or enclose them in a table with table

(or both).

 int_grps <- split( dataframe[,1], interaction( dataframe[,-1], drop=TRUE) )

 int_counts <- table( interaction( dataframe[,-1], drop=TRUE ) )

      



If you only want to list the existing combinations, the code could be:

names(table(interaction( dataframe[,-1], drop=TRUE)) )    

      

+1


source


I would use the function group_by()

in dplyr

to group the data into Var1, Var2, ..., Var6

. Then you can use summarise()

to find the number of times each combination occurs.



library(dplyr)

df <- read.table(text = 
"id    Var1    Var2    Var3    Var4    Var5    Var6
   1     yes     yes     yes     no      yes     yes
   2     no      no      yes     yes     no      yes
   3     no      no      yes     yes     no      yes
   250   no      yes     yes     yes     yes     yes
", header = TRUE, stringsAsFactors = FALSE)

df %>%
  group_by(Var1, Var2, Var3, Var4, Var5, Var6) %>%
  summarise(n_occur = n())

      

+1


source


Here you are looking interaction

.

with (yourdata, interaction (Var1, Var2, Var3, Var4,Var5, Var6 ))

      

Or, as suggested by @thelatemail:

do.call(interaction,c(yourdata[-1],drop=TRUE))

      

0


source







All Articles