R: How to avoid 2 'for' loops in R in this function

Question

R: How to avoid 2 'for' loops in R in this function

I know there are many topics on how to avoid R loops, but I couldn't figure out how I could vectorize my iterations. I have a dataset that I represent to m. I want to create a new matrix with this function, which will consist of p.values correlation coefficients for each data column (m).

m<-matrix(rnorm(100),nrow=10,ncol=10)
sig.p<-function(x){
  n= ncol(x)
  p.values<-numeric(n)
  p.values<-matrix(nrow=n,ncol=n)
  for (i in 1:C){
    for (t in 1:C){
      p.values[t,i]<-cor.test(x[,i],x[,t])$p.value
    }
  }
  p.values
}
sig.p(m)

I couldn't figure out how I can use mapply (if so). Can anyone help with any suggestions on how I could vectorize these iterations (using mapply or other) Thanks in advance!

Cesar

+3

iteration for-loop r

CAOC Dec 30. 15 at 16:44

source to share

4 answers

Not as succinct as @akrun's answer, but here is a basic R solution:

sig.p <- function(M){
  sapply(1:ncol(M), function(z){
    sapply(1:ncol(M), function(x,Y=z){
      cor.test(M[,Y],M[,x])$p.value
    })
  })
}
##
R> sig.p(m)
            [,1]       [,2]        [,3]       [,4]      [,5]       [,6]      [,7]       [,8]        [,9]      [,10]
 [1,] 0.00000000 0.08034470 0.244411381 0.03293644 0.3234899 0.80352003 0.5326317 0.03896285 0.702987267 0.57721440
 [2,] 0.08034470 0.00000000 0.087168145 0.44828479 0.4824117 0.76469973 0.8222813 0.17662866 0.607145382 0.41460977
 [3,] 0.24441138 0.08716815 0.000000000 0.20634394 0.9504582 0.11864029 0.2148186 0.28450468 0.009396629 0.51450066
 [4,] 0.03293644 0.44828479 0.206343943 0.00000000 0.8378530 0.78122849 0.0544312 0.22943728 0.524608029 0.66329385
 [5,] 0.32348990 0.48241166 0.950458153 0.83785303 0.0000000 0.66105999 0.3157296 0.35715193 0.927945195 0.63163949
 [6,] 0.80352003 0.76469973 0.118640294 0.78122849 0.6610600 0.00000000 0.7181462 0.67602651 0.749641726 0.03218081
 [7,] 0.53263166 0.82228134 0.214818607 0.05443120 0.3157296 0.71814620 0.0000000 0.39393423 0.266039043 0.38619000
 [8,] 0.03896285 0.17662866 0.284504679 0.22943728 0.3571519 0.67602651 0.3939342 0.00000000 0.512083873 0.30980598
 [9,] 0.70298727 0.60714538 0.009396629 0.52460803 0.9279452 0.74964173 0.2660390 0.51208387 0.000000000 0.92533524
[10,] 0.57721440 0.41460977 0.514500658 0.66329385 0.6316395 0.03218081 0.3861900 0.30980598 0.925335242 0.00000000

+4

nrussell Dec 30. 15 at 17:16

source to share

This is a typical usage mapply

:

M <- expand.grid(seq_len(ncol(m),seq_len(ncol(m)))
mapply(function(x,y)cor.test(m[,x], m[,y])$p.value,M$Var1,M$Var2)

+3

agstudy Dec 30. 15 at 17:56

source to share

Vectorization isn't always all that it cracks. Not sure how big your actual matrix is, but for that size, or even 100 x 100, that's a fairly small one-time cost.

You can more than double your performance by changing your loop structure as follows:

sig.p<-function(x){
    n <- ncol(x)
    p.values<-matrix(nrow=n,ncol=n)
    for (i in 1:(n-1)){
        for (t in (i+1):n){
            p.values[t,i]<-cor.test(x[,i],x[,t])$p.value
        }
    }
    p.values
}

Basically only calculate the bottom triangle since you know the diagonal will be zero and the matrix is symmetric. mapply

or sapply

applied across the entire matrix may not work better than this.

+1

vpipkt Dec 30. 15 at 17:37

source to share

akrun · Accepted Answer · 2014-12-30T17:15:16+0000

You can use rcorr

fromlibrary(Hmisc)

 library(Hmisc)
 rcorr(m)$P

Or use

 library(psych)
 corr.test(as.data.frame(m))$p

Or using outer

frombase R

  outer(1:ncol(m),1:ncol(m), FUN= Vectorize(function(x,y) 
                              cor.test(m[,x], m[,y])$p.value))

Benchmarks

I tried using a smaller dataset ( 100*100

) and a slightly larger dataset ( 1e3*1e3

). Here are the functions:

 akrun <- function() {outer(1:ncol(m1),1:ncol(m1), 
            FUN= Vectorize(function(x,y) cor.test(m1[,x],
                              m1[,y])$p.value))}

 akrun2 <- function(){rcorr(m1)$P}
 agstudy <- function() {M <- expand.grid(seq_len(ncol(m1)),
                          seq_len(ncol(m1)))
      mapply(function(x,y)cor.test(m1[,x], m1[,y])$p.value,M$Var1,M$Var2)}
 vpipk <-function(){
        n <- ncol(m1)
        p.values<-matrix(nrow=n,ncol=n)
   for (i in 1:(n-1)){
      for (t in (i+1):n){
          p.values[t,i]<-cor.test(m1[,i],m1[,t])$p.value
     }
   }
   p.values
  }


 nrussell <- function(){
   sapply(1:ncol(m1), function(z){
   sapply(1:ncol(m1), function(x,Y=z){
     cor.test(m1[,Y],m1[,x])$p.value
     })
   })
}

In dataset 100*100

 library(microbenchmark)
 set.seed(25)
 m1 <- matrix(rnorm(1e2*1e2),nrow=1e2,ncol=1e2)
 microbenchmark(akrun(), akrun2(), agstudy(), vpipk(),
                    nrussell(), unit='relative', times=10L)
 #Unit: relative
 #  expr      min       lq     mean   median       uq      max neval cld
 #   akrun() 257.2310 255.9766 252.2163 254.4946 248.9807 246.5429    10   c
 #  akrun2()   1.0000   1.0000   1.0000   1.0000   1.0000   1.0000    10   a  
 # agstudy() 255.5920 258.0813 253.5411 256.0581 250.4833 249.0503    10   c
 #   vpipk() 125.8218 126.3337 125.4592 126.8479 124.9835 124.1383    10   b 
 #nrussell() 257.9283 256.8480 252.5297 256.0160 250.8853 242.0896    10   c

If I changed 1e2

to 1e3

(did not have time to do microbenchmark

, but heresystem.time

system.time(akrun())
 # user  system elapsed 
#403.563   0.751 404.198 

system.time(akrun2())
 #  user  system elapsed 
 # 3.110   0.008   3.117 

system.time(agstudy())
 #  user  system elapsed 
 #445.108   0.877 445.947 

system.time(vpipk())
#  user  system elapsed 
#155.597   0.224 155.760 

system.time(nrussell())
#  user  system elapsed 
#452.524   1.220 453.713

R: How to avoid 2 'for' loops in R in this function

Benchmarks

More articles: