R data.table and iterating over two columns

I am new to R and am trying to solve the following problem:

There is a table with two columns books

and readers

these books where books

, and readers

- the book and the reader IDs, respectively:

> books = c (1,2,3,1,1,2)
> readers = c(30, 10, 20, 20, 10, 30)
> bt = data.table(books, readers)
> bt
   books readers
1:     1      30
2:     2      10
3:     3      20
4:     1      20
5:     1      10
6:     2      30

      

For each book pair, I need to count the number of readers who read both of these books, with this algorithm:

for each book
  for each reader of the book
    for each other_book in books of the reader
      increment common_reader_count ((book, other_book), cnt)

      

To implement the above algorithm, I need to group this data into two lists: 1) a book list containing the readers of each book, and 2) a reader list containing the books each reader has read, such as:

> bookList = list( 
+ list(1, list(30, 20, 10)),
+         list(2, list(10, 30)),
+         list(3, list(20))
+       )
> 
> readerList = list (
+ list(30, list(1,2)),
+ list(20, list(3,1)),
+ list(10, list(2,1))
+ )
>  

      

Questions:

1) What functions should I use to create these lists from the table of books?

2) From bookList

and readerList

how to pair books with the number of readers who have read both of these books? For the log table bt

above, the result should be:

((1, 2), 2)
((1,3), 1)
((2,3), 0)  

      

The order of the books in a pair does not matter, therefore, for example, (1,2)

and (2,1)

should be reduced to one.

Please consult functions and data structures to resolve this issue. Thank!

Update:

Idealism as a result I need to get a matrix with the book id in both rows and columns. Intersection is the number of readers who read both books in pairs. So, for the above example matrix it should be:

books | 1 | 2 | 3 |
   1  | 1 | 2 | 1 |
   2  | 2 | 1 | 0 |
   3  | 1 | 0 | 1 |

   Which means:

   book 1 and 2 are read together by 2 readers 
   book 1 and 3 are read together by 1 reader
   book 2 and 3 are read together by 0 readers

      

How to build such a matrix?

+3


source to share


3 answers


Here's another option:

combs <- combn(unique(books), 2)# Generate combos of books
setkey(bt, books)
both.read <-bt[                 # Cartesian join all combos to our data
  data.table(books=c(combs), combo.id=c(col(combs))), allow.cartesian=T
][,
  .(                            # For each combo, figure out how many readers show up twice, meaning they've read both books
    read.both=sum(duplicated(readers)), 
    book1=min(books), book2=max(books)
  ),
  by=combo.id
]
dcast.data.table(               # dcast to desired format
  both.read, book1 ~ book2, value.var="read.both", fun.aggregate=sum
)

      

Outputs:



   book1 2 3
1:     1 2 1
2:     2 0 0

      

Note that by design, this only does non-equivalent combinations (i.e. we do not show books 1-2 and 2-1, only 1-2, since they are the same).

+3


source


try this:

## gives you a seperate list for each book
list_bookls <- split(bt$readers, books)

## gives you a seperate list for each reader
list_readers <- split(bt$books, readers)

      

another form of output with output in the form of a data table. data and the number of books read by each reader and the number of books that each reader reads:

bt[ , .("N Books" = length(unique(books))), by = readers]
bt[ , .("N Readers" = length(unique(readers))), by = readers]

      



for the second part of your question, I would use the following:

bt2 <- bt[ , .N, by = .(readers, books)]
library(tidyr)
spread(bt2, key = books, value = "N", fill = 0)

      

The output is a table that gives 1 if the books are read by reader X and 0 otherwise:

   readers 1 2 3
1:      10 1 1 0
2:      20 1 0 1
3:      30 1 1 0

      

+1


source


Here's a basic R solution for checking if pairs have been read. Someone can add one for data.table

if you absolutely need to use it:

books = c (1,2,3,1,1,2)
readers = c(30, 10, 20, 20, 10, 30)
bks = data.frame(books, readers)

cmb <- combn(unique(books), 2)
cmb <- t(cmb)
combos <- as.data.frame(cmb)
bktbl <- t(table(bks))

for (i in 1:nrow(bktbl)) {
  x[i] <- sum(bktbl[i, cmb[i, 1]], bktbl[i, cmb[i, 2]])
  combos$PairRead <- ifelse(x > 1,"yes", "no")
}
combos
  V1 V2 PairRead
1  1  2      yes
2  1  3      yes
3  2  3       no

      

-1


source







All Articles