How to use `cor.test` to correlate specific columns?
I have the following sample data:
A<-rnorm(100)
B<-rnorm(100)
C<-rnorm(100)
v1<-as.numeric(c(1:100))
v2<-as.numeric(c(2:101))
v3<-as.numeric(c(3:102))
v2[50]<-NA
v3[60]<-NA
v3[61]<-NA
df<-data.frame(A,B,C,v1,v2,v3)
As you can see, df has 1 NA in column 5 and 2 NA in column 6. Now I would like to make a correlation matrix of col1 and 3 on the one hand and col2,4,5,6 on the other. Using the cor function in R:
cor(df[ , c(1,3)], df[ , c(2,4,5,6)], use="complete.obs")
# B v1 v2 v3
# A -0.007565203 -0.2985090 -0.2985090 -0.2985090
# C 0.032485874 0.1043763 0.1043763 0.1043763
It works. I would like to have both an evaluation and a p.value and so I switch to cor.test.
cor.test(df[ ,c(1,3)], df[ , c(2,4,5,6)], na.action = "na.exclude")$estimate
This does not work as "x" and "y" must be the same length. This error actually happens with or without NA in the data. It looks like cor.test does not understand (unlike cor) a query to correlate specific columns. Is there a solution to this problem?
source to share
You can use outer
to run test between all pairs of columns. Here X
and Y
are data frames extended from df
, consisting of 8 columns.
outer(df[, c(1,3)], df[, c(2,4,5,6)], function(X, Y){
mapply(function(...) cor.test(..., na.action = "na.exclude")$estimate,
X, Y)
})
You even get output in the same form as cor
:
B v1 v2 v3
A 0.07844426 0.01829566 0.01931412 0.01528329
C 0.11487140 -0.14827859 -0.14900301 -0.15534569
source to share