How can I convert a two-column matrix "count" to a binary vector in R?

How can I convert a dataframe with a two-column counter matrix to a dataframe with one binary vector in R? For example, I have a data frame like this, where id is the identifier of the subject, s and f are the number of "successes" and "failures" for this subject, and x is a third variable describing some of the traits of this subject,

id s f x
1  0 3 A
2  2 1 A
3  1 2 B

      

I want this dataframe to be converted to:

id n x
1  f A
1  f A
1  f A
2  s A
2  s A
2  f A
3  s B
3  f B
3  f B

      

where column n indicates whether each test succeeds or fails (f).

I'm sure I could create a function for this, but I'm wondering if there is a ready-made solution.

+3


source to share


3 answers


  dd <- read.table(text="id s f x
    1  0 3 A
    2  2 1 A
    3  1 2 B",
    header=TRUE)

 with(dd,data.frame(
         id=rep(id,s+f),
         n=rep(rep(c("s","f"),nrow(dd)),c(rbind(s,f))),
         x=rep(x,s+f)))

      



+5


source


Here is one way to use a package tidyr

, splitstackshape

. You change your data with gather

. Then you can use expandRows

in package splitstackshape

. You are asking R to repeat each line numbered in the value column. For display purposes, I used arrange()

from a package dplyr

. But this part is optional.



library(tidyr)
library(splitstackshape)
library(dplyr)

gather(mydf, variable, value, -id, -x) %>%
expandRows("value") %>%
arrange(id, x)


#  id x variable
#1  1 A        f
#2  1 A        f
#3  1 A        f
#4  2 A        s
#5  2 A        s
#6  2 A        f
#7  3 B        s
#8  3 B        f
#9  3 B        f

      

+4


source


Using Ben Bolker's excellent answer above, I created a short function that will do this for any dataframe containing one column of success counts, one column for the number of failures, and for any number of additional columns containing information about each row (subject) ... See example below.

#####################################################################
### cnt2bin (count to binary) takes a data frame with 2-column ######
### "count" response variable of successes and failures and    ######
### converts it to long format, with one column showing        ######
### 0s and 1s for failures and successes.                      ######
### data is data frame with 2-column response variable         ######
### suc and fail are character expressions for columns         ######
### containing counts of successes and failures respectively   ######
#####################################################################

cnt2bin <- function(data, suc, fail) {

  xvars <- names(data)[names(data)!=suc & names(data)!=fail]
  list <- lapply(xvars, function(z) with(data, rep(get(z), get(suc)+get(fail))))
  names(list) <- xvars
  df <- as.data.frame(list)
  with(data,data.frame(bin=rep(rep(c(1,0),nrow(data)),c(rbind(get(suc),get(fail)))),
                       df))
}

      

An example where id is the subject identifier, s and f are the columns that count the successes and failures for each subject, and x and y are variables that describe the attributes of each subject to be expanded and added to the final data frame.

dd <- read.table(text="id s f x y
                       1  0 3 A A
                       2  2 1 A B
                       3  1 2 B B",
                  header=TRUE)

cnt2bin(dd, "s", "f")

      

+3


source







All Articles