Cross-column comparison of the same frame.

I have a data.frame that looks like this:

> DF1      

 A    B    C    D    E     
 a    x    c    h    p 
 c    d    q    t    w
 s    e    r    p    a
 w    l    t    s    i
 p    i    y    a    f

      

I would like to compare each column of my data.frame with the rest of the columns in order to count the number of common items. For example, I would like to compare column A with all other columns (B, C, D, E) and count the common objects like this:

A versus the rest:

  • A vs B: 0 (because they have 0 elements in common)
  • A vs C: 1 (c together)
  • A vs D: 2 (p and s together)
  • A vs E: 3 (p, w, a, together)

Then the same: B versus columns C, D, E, etc.

Can anyone help me? I don't know how to implement this.

+3


source to share


2 answers


We can iterate over the column names and compare them to other columns by taking intersect

and gettinglength

sapply(names(DF1), function(x) {
    x1 <- lengths(Map(intersect, DF1[setdiff(names(DF1), x)], DF1[x]))
    c(x1, setNames(0, setdiff(names(DF1), names(x1))))[names(DF1)]})
#  A B C D E
#A 0 0 1 3 3
#B 0 0 0 0 1
#C 1 0 0 1 0
#D 3 0 1 0 2 
#E 3 1 0 2 0

      




Or it can be done more compactly by taking the cross product after getting the long format frequency ( melt

) dataset

library(reshape2)
tcrossprod(table(melt(as.matrix(DF1))[-1])) * !diag(5)
#    Var2
#Var2 A B C D E
#   A 0 0 1 3 3
#   B 0 0 0 0 1
#   C 1 0 0 1 0
#   D 3 0 1 0 2
#   E 3 1 0 2 0

      

NOTE. A part is crossprod

also implemented from RcppEigen

here that will make it faster

+3


source


An alternative is to use it combn

twice, once, to get the combinations of columns and find the lengths of the intersections of the elements.

cbind.data.frame

returns data.frame and is setNames

used to add column names.



setNames(cbind.data.frame(t(combn(names(df), 2)),
                 combn(names(df), 2, function(x) length(intersect(df[, x[1]], df[, x[2]])))),
         c("col1", "col2", "count"))
   col1 col2 count
1     A    B     0
2     A    C     1
3     A    D     3
4     A    E     3
5     B    C     0
6     B    D     0
7     B    E     1
8     C    D     1
9     C    E     0
10    D    E     2

      

+1


source







All Articles