R counts the row variables in each row of the data frame

I have a dataframe that looks something like this, where each line represents patterns and has repeats of the same lines

> df
  V1 V2 V3 V4 V5
1  a  a  d  d  b
2  c  a  b  d  a
3  d  b  a  a  b
4  d  d  a  b  c
5  c  a  d  c  c

      

I want to be able to create a new dataframe where ideally the headers will be string variables in the previous dataframe (a, b, c, d), and the contents of each row will be the number of occurrences of each corresponding variable from the original dataframe. Using the example above, it would look like

> df2
   a  b  c  d 
1  2  1  0  2  
2  2  1  1  1  
3  2  1  0  1
4  1  1  1  2  
5  1  0  3  1  

      

There are hundreds of variables and thousands of samples in my actual dataset, so it would be ideal if I could automatically pull names from the original dataframe and alphabetically order them into headers for the new dataframe.

+3


source to share


2 answers


You may try

library(qdapTools)
mtabulate(as.data.frame(t(df)))

      

or



mtabulate(split(as.matrix(df), row(df)))

      

Or using base R

Un1 <- sort(unique(unlist(df)))
t(apply(df ,1, function(x) table(factor(x, levels=Un1))))

      

+3


source


You can stack

columns and then use table

:



table(cbind(id = 1:nrow(mydf), 
            stack(lapply(mydf, as.character)))[c("id", "values")])
#    values
# id  a b c d
#   1 2 1 0 2
#   2 2 1 1 1
#   3 2 2 0 1
#   4 1 1 1 2
#   5 1 0 3 1

      

+1


source







All Articles