R counts the row variables in each row of the data frame

Question

R counts the row variables in each row of the data frame

I have a dataframe that looks something like this, where each line represents patterns and has repeats of the same lines

> df
  V1 V2 V3 V4 V5
1  a  a  d  d  b
2  c  a  b  d  a
3  d  b  a  a  b
4  d  d  a  b  c
5  c  a  d  c  c

I want to be able to create a new dataframe where ideally the headers will be string variables in the previous dataframe (a, b, c, d), and the contents of each row will be the number of occurrences of each corresponding variable from the original dataframe. Using the example above, it would look like

> df2
   a  b  c  d 
1  2  1  0  2  
2  2  1  1  1  
3  2  1  0  1
4  1  1  1  2  
5  1  0  3  1

There are hundreds of variables and thousands of samples in my actual dataset, so it would be ideal if I could automatically pull names from the original dataframe and alphabetically order them into headers for the new dataframe.

+3

string r dataframe

ricks.k May 02 '15 at 20:45

source to share

2 answers

akrun · Answer 1 · 2015-05-02T20:46:57+0000

You may try

library(qdapTools)
mtabulate(as.data.frame(t(df)))

or

mtabulate(split(as.matrix(df), row(df)))

Or using base R

Un1 <- sort(unique(unlist(df)))
t(apply(df ,1, function(x) table(factor(x, levels=Un1))))

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2015-05-03T04:03:50+0000

You can stack

columns and then use table

:

table(cbind(id = 1:nrow(mydf), 
            stack(lapply(mydf, as.character)))[c("id", "values")])
#    values
# id  a b c d
#   1 2 1 0 2
#   2 2 1 1 1
#   3 2 2 0 1
#   4 1 1 1 2
#   5 1 0 3 1

R counts the row variables in each row of the data frame

More articles: