Encode multiple choice response in R

I have a CSV dataset that looks like this:

Age;Functions;...
12;1,2,5;...
45;1,4,5,8;...
23;3;...

      

The first column is the age of the participant, and the second column is a list of multiple answers to question 1. In this example, the first participant checks checkboxes 1, 2, and 5, and only the 3rd participant checks the 3rd checkbox.

Now I want to evaluate the answers to question 1. The first step is to plot the number of responses for each possible answer. I've tried the following:

dataset$Functions <- strsplit(as.character(dataset$Functions), ",", fixed=T)
dataset$Functions <- lapply(dataset$Functions, factor, levels = 0:8, labels=c(
"no answer",
"checkbox 1",
"checkbox 2",
"checkbox 3",
"checkbox 4",
"checkbox 5",
"checkbox 6",
"checkbox 7",
"checkbox 8",
))

      

Additionally, I tried mChoice:

library("Hmisc")
dataset$Functions <- lapply(dataset$Functions, mChoice, label="Functions")

      

But now I don't know how to handle the list in the dataframe. Do you have an idea?

+3


source to share


1 answer


Personaly I prefer to first transform the multiple choice variable into a series of dichotomous variables, one for each possible choice. For example, if you have the following dataframe:

d <- data.frame(age=c(25,35,45,55,65),var=c("1,2,3","1,2","3","2","1"))

  age   var
1  25 1,2,3
2  35   1,2
3  45     3
4  55     2
5  65     1

      

You can use the following code:

lev <- levels(factor(d$var))
lev <- unique(unlist(strsplit(lev, ",")))
mnames <- gsub(" ", "_", paste("var", lev, sep = "."))
result <- matrix(data = "0", nrow = length(d$var), ncol = length(lev))
char.var <- as.character(d$var)
for (i in 1:length(lev)) {
  result[grep(lev[i], char.var, fixed = TRUE), i] <- "1"
}
result <- data.frame(result, stringsAsFactors = TRUE)
colnames(result) <- mnames
d <- cbind(d,result)

      

Which will give you three new variables:



  age   var var.1 var.2 var.3
1  25 1,2,3     1     1     1
2  35   1,2     1     1     0
3  45     3     0     0     1
4  55     2     0     1     0
5  65     1     1     0     0

      

Here you can use each of these new variables for statistics or cross-tabulation. If you want to create a global table of frequencies of different options, you can do this:

vars <- c("var.1","var.2","var.3")
as.table(sapply(d[,vars], function(v) {
  sel <- as.numeric(v==1)
  sum(sel)
}))

      

What will give you:

var.1 var.2 var.3 
    3     3     2 

      

+3


source







All Articles