Create a random contingency table to check items

I would like to create a random contingency table with equal margins.

The simplest example is to have a table:

3   3   3   | 9
3   3   3   | 9
3   3   3   | 9
_   _   _ 
9   9   9   

      

For So sum(r_i) = sum(c_j) =9

. I would like to find all event tables that match these criteria, and then analyze some functions for that set of tables.

Is there a "simple" way to generate these tables in R?

+3


source to share


2 answers


Your question is not entirely accurate. Generating random random tables is easy. Finding all contingency tables that meet these criteria can be more difficult because the probabilities of the tables are very heterogeneous and would require a very large sample to retrieve them all. (Someone started a package-based deterministic enumeration solution partitions

, but seems to have deleted their answer ...) r2dtable

in the package stats

(main package) displays random tables:

Creation of just 1 sample (results are returned as a list):

 set.seed(101)
 r2dtable(n=1,r=c(9,9,9),c=c(9,9,9))[[1]]
 ##      [,1] [,2] [,3]
 ## [1,]    4    3    2
 ## [2,]    2    4    3
 ## [3,]    3    2    4

      

How likely is your example?

 set.seed(102)
 tList <- r2dtable(n=50000,r=c(9,9,9),c=c(9,9,9))

      

Convert results to strings for easier comparisons:

 vals <- sapply(tList,function(x) paste(c(x),collapse=""))

      

How many are there?

 length(unique(vals))  ## 1018

      

Update : The larger sample (n = 500000) gave 1276 unique tables. This seems more plausible on the basis of the symmetry, but it may not be complete - based on the logarithmic frequency distribution, I probably haven't caught the longer tail yet.

In fact there is: this web page makes it possible to calculate the number of tables; there are 1540 for all fields equal to 9.

Log frequency distribution:



plot(log10(rev(sort(table(vals)))),type="l")

      

enter image description here

The most common tables are:

 head(rev(sort(table(vals))))
 ## vals
 ## 333333333 342324333 333324342 333342324 423333243 234333432 
 ##       996       626       626       605       596       592

      

(For extra credit, I should try to collapse symmetrical cases.)

Equality probability:

 mean(vals=="333333333") ## 0.1992

      

The deterministic approach (which I hope the owner will recover) starts with a function compositions()

from the package partitions

that lists all the ways to split an integer N

into components N

: compositions(9,3)

gives all sets of 3 non-negative integers that add up to 9, which represents all possible rows / columns in your contingency matrix.

I'm still thinking about how to take these raw materials and combine them to list tables: there must be at least 1276 of them, so it's not just all permutations of individual compositions (which would only give 3! * 55 = 330).

This is a start, but doesn't really work:

library("partitions")
cc <- compositions(9,3)
too.many <- combn(split(cc,col(cc)),3,
                 FUN=function(x) do.call(cbind,x),
                  simplify=FALSE)  ## 26235
ok <- sapply(too.many,function(x) all(rowSums(x)==9))

      

Only 252 OK? Perhaps we need to allow all permutations of these results (which would allow 252 * 6 = 1512, a plausible result ...)?

+5


source


It can happen like a crazy answer to this question, and it is a crazy (as well as incomplete) answer. But it has to do with your intended outcome as well as Sudoku. So it's funny.

library("sudokuAlt")
g <- matrix(as.numeric(makeGame(3,0)), nrow = 9)
colSums(g)
# [1] 45 45 45 45 45 45 45 45 45
rowSums(g)
# [1] 45 45 45 45 45 45 45 45 45

      



Basically, a Sudoku puzzle is a specific case of what you are looking for. It also adds the constraint that each row and each column has not only the same margins, but also the same combination of elements. This package only implements 9x9, 16x16, or 25x25 puzzles, but you can look at the source code to see how they generate the puzzles and maybe build a more general solution there.

+1


source







All Articles