Encode multiple choice response in R
I have a CSV dataset that looks like this:
Age;Functions;... 12;1,2,5;... 45;1,4,5,8;... 23;3;...
The first column is the age of the participant, and the second column is a list of multiple answers to question 1. In this example, the first participant checks checkboxes 1, 2, and 5, and only the 3rd participant checks the 3rd checkbox.
Now I want to evaluate the answers to question 1. The first step is to plot the number of responses for each possible answer. I've tried the following:
dataset$Functions <- strsplit(as.character(dataset$Functions), ",", fixed=T)
dataset$Functions <- lapply(dataset$Functions, factor, levels = 0:8, labels=c(
"no answer",
"checkbox 1",
"checkbox 2",
"checkbox 3",
"checkbox 4",
"checkbox 5",
"checkbox 6",
"checkbox 7",
"checkbox 8",
))
Additionally, I tried mChoice:
library("Hmisc")
dataset$Functions <- lapply(dataset$Functions, mChoice, label="Functions")
But now I don't know how to handle the list in the dataframe. Do you have an idea?
source to share
Personaly I prefer to first transform the multiple choice variable into a series of dichotomous variables, one for each possible choice. For example, if you have the following dataframe:
d <- data.frame(age=c(25,35,45,55,65),var=c("1,2,3","1,2","3","2","1"))
age var
1 25 1,2,3
2 35 1,2
3 45 3
4 55 2
5 65 1
You can use the following code:
lev <- levels(factor(d$var))
lev <- unique(unlist(strsplit(lev, ",")))
mnames <- gsub(" ", "_", paste("var", lev, sep = "."))
result <- matrix(data = "0", nrow = length(d$var), ncol = length(lev))
char.var <- as.character(d$var)
for (i in 1:length(lev)) {
result[grep(lev[i], char.var, fixed = TRUE), i] <- "1"
}
result <- data.frame(result, stringsAsFactors = TRUE)
colnames(result) <- mnames
d <- cbind(d,result)
Which will give you three new variables:
age var var.1 var.2 var.3
1 25 1,2,3 1 1 1
2 35 1,2 1 1 0
3 45 3 0 0 1
4 55 2 0 1 0
5 65 1 1 0 0
Here you can use each of these new variables for statistics or cross-tabulation. If you want to create a global table of frequencies of different options, you can do this:
vars <- c("var.1","var.2","var.3")
as.table(sapply(d[,vars], function(v) {
sel <- as.numeric(v==1)
sum(sel)
}))
What will give you:
var.1 var.2 var.3
3 3 2
source to share