Why 2 chisq.test outputs are different in R
As a result, why the outputs from 2 chisq.test differ from each other when the data is indeed the same:
> df1
count position
1 1 11
2 6 12
3 12 13
4 23 14
5 27 15
> df2
count position
1 1 11
2 4 12
3 9 13
4 24 14
5 24 15
> mm = merge(df1, df2, by='position')
> mm
position count.x count.y
1 11 1 1
2 12 6 4
3 13 12 9
4 14 23 24
5 15 27 24
First method:
> chisq.test(mm[2:3])
Pearson Chi-squared test
data: mm[2:3]
X-squared = 0.6541, df = 4, p-value = 0.9569
Warning message:
In chisq.test(mm[2:3]) : Chi-squared approximation may be incorrect
Second method:
> chisq.test(df1$count, df2$count)
Pearson Chi-squared test
data: df1$count and df2$count
X-squared = 15, df = 12, p-value = 0.2414
Warning message:
In chisq.test(df1$count, df2$count) :
Chi-squared approximation may be incorrect
>
Edit: Respond to comment: The following looks the same:
> mm[2:3]
count.x count.y
1 1 1
2 6 4
3 12 9
4 23 24
5 27 24
>
> mm[,2:3]
count.x count.y
1 1 1
2 6 4
3 12 9
4 23 24
5 27 24
Data:
> dput(df1)
structure(list(count = c(1L, 6L, 12L, 23L, 27L), position = 11:15), .Names = c("count",
"position"), class = "data.frame", row.names = c(NA, -5L))
> dput(df2)
structure(list(count = c(1L, 4L, 9L, 24L, 24L), position = 11:15), .Names = c("count",
"position"), class = "data.frame", row.names = c(NA, -5L))
source to share
cm.? chisq: in the first case mm [2: 3] is taken as the contingency table, in the second case the contingency table is calculated.
chisq.test(table(df1$count, df2$count))
Pearson Chi-squared test
data: table(df1$count, df2$count)
X-squared = 15, df = 12, p-value = 0.2414
Warning message:
In chisq.test(table(df1$count, df2$count)) :
Chi-squared approximation may be incorrect
So, indeed, you compute the chisq of this table:
1 4 9 24
1 1 0 0 0
6 0 1 0 0
12 0 0 1 0
23 0 0 0 1
27 0 0 0 1
source to share
The R documentation chisq.test
states that
If x is a matrix with at least two rows and columns, it is taken as a two-dimensional table of unexpected events
So, as you type, chisq.test(mm[2:3])
your matrix is a table of unexpected events.
In the second case, when yout type chisq.test(df1$count, df2$count)
, the contendency table is computed (with a function table
) from vectors df1$count
anddf2$count
source to share