R For Fisher Test Run - Error Message
My dataframe looks like this:
595.00000 18696 984.00200 32185 Group1
935.00000 18356 1589.00000 31580 Group2
40.00010 19251 73.00000 33096 Group3
1058.00000 18233 1930.00000 31239 Group4
19.00000 19272 27.00000 33142 Group5
1225.00000 18066 2149.00000 31020 Group6
....
For each group, I want to do an exact Fisher test.
table <- matrix(c(595.00000, 984.00200, 18696, 32185), ncol=2, byrow=T)
Group1 <- Fisher.test(table, alternative="greater")
Tried looping over the dataframe with:
for (i in 1:nrow(data.frame))
{
table= matrix(c(data.frame$V1, data.frame$V2, data.frame$V3, data.frame$V4), ncol=2, byrow=T)
fisher.test(table, alternative="greater")
}
But there was an error message
Error in fisher.test(table, alternative = "greater") :
FEXACT error 40.
Out of workspace.
In addition: Warning message:
In fisher.test(table, alternative = "greater") :
'x' has been rounded to integer: Mean relative difference: 2.123828e-06
How can I fix this problem, or maybe do another way to iterate over the data?
source to share
Your first mistake: Out of workspace
?fisher.test
fisher.test(x, y = NULL, workspace = 200000, hybrid = FALSE,
control = list(), or = 1, alternative = "two.sided",
conf.int = TRUE, conf.level = 0.95,
simulate.p.value = FALSE, B = 2000)
You should try to increase the value workspace
(default = 2e5).
However, this happens in your case, because you have really huge values. Typically, if all the elements in your matrix are> 5 (or in your case 10, since df = 1), then you can safely approximate it using the square square independence criterion with chisq.test
. For your case, I think you should use chisq.test
.
And it warning message
happens because your values ββare not integers (595,000) etc. So, if you really want to use recursively fisher.test
, do this (if your data is in df
and is data.frame
>:
# fisher.test with bigger workspace
apply(as.matrix(df[,1:4]), 1, function(x)
fisher.test(matrix(round(x), ncol=2), workspace=1e9)$p.value)
Or, if you prefer to replace chisq.test
(which I think you need for these huge values ββto improve performance without any significant difference in p values):
apply(as.matrix(df[,1:4]), 1, function(x)
chisq.test(matrix(round(x), ncol=2))$p.value)
This will extract the p-values.
Edit 1: I only noticed that you are using one-sided Fisher exact test
. Maybe you should continue using Fisher's large workspace test, as I'm not sure about the one-tailed square independence test as it is already calculated in probability right-tail
(and you cannot divide the p-values ββby 2 as asymmetric).
Edit 2: Since you require a p-value group name and you already have a data.frame, I suggest you use the package data.table
like this:
# example data
set.seed(45)
df <- as.data.frame(matrix(sample(10:200, 20), ncol=4))
df$grp <- paste0("group", 1:nrow(df))
# load package
require(data.table)
dt <- data.table(df, key="grp")
dt[, p.val := fisher.test(matrix(c(V1, V2, V3, V4), ncol=2),
workspace=1e9)$p.value, by=grp]
> dt
# V1 V2 V3 V4 grp p.val
# 1: 130 65 76 82 group1 5.086256e-04
# 2: 70 52 168 178 group2 1.139934e-01
# 3: 55 112 195 34 group3 7.161604e-27
# 4: 81 43 91 80 group4 4.229546e-02
# 5: 75 10 86 50 group5 4.212769e-05
source to share