Define all combinations of six variables in R
I have a data frame with 6 variables and 250 observations that looks like this:
id Var1 Var2 Var3 Var4 Var5 Var6 **
1 yes yes yes no yes yes
2 no no yes yes no yes
...
250 no yes yes yes yes yes
I want to identify all combinations of variables present in the data. For example, I know that there are 20 cases with yes for each variable.
I am doing a grouping analysis and want to group observations based on these yes / no variables. 20 cases with "yes" for each variable will be group # 1, 20 other cases with Var1 = yes and Var2: Var6 = no will be group # 2, and so on.
I tried to use count in plyr like this:
> count(dataframe[,-1])
It didn't work. Any suggestions would be great!
source to share
You can use interaction
or paste( ..., sep="_")
to create combinations, but then you need to do something with them. Either split
they are in separate categories (which will keep the IDs), or enclose them in a table with table
(or both).
int_grps <- split( dataframe[,1], interaction( dataframe[,-1], drop=TRUE) )
int_counts <- table( interaction( dataframe[,-1], drop=TRUE ) )
If you only want to list the existing combinations, the code could be:
names(table(interaction( dataframe[,-1], drop=TRUE)) )
source to share
I would use the function group_by()
in dplyr
to group the data into Var1, Var2, ..., Var6
. Then you can use summarise()
to find the number of times each combination occurs.
library(dplyr)
df <- read.table(text =
"id Var1 Var2 Var3 Var4 Var5 Var6
1 yes yes yes no yes yes
2 no no yes yes no yes
3 no no yes yes no yes
250 no yes yes yes yes yes
", header = TRUE, stringsAsFactors = FALSE)
df %>%
group_by(Var1, Var2, Var3, Var4, Var5, Var6) %>%
summarise(n_occur = n())
source to share