Slow point product in R

Question

Slow point product in R

I am trying to take a dot product from a 331x23152 and 23152x23152 matrix.

In Python and Octave this is a trivial operation, but in R it seems incredibly slow.

N <- 331
M <- 23152

mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M) 
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M) 
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

Output signal

user  system elapsed 
101.95    0.04  101.99

In other words, this dot product takes over 100 seconds.

I am running R-3.4.0 64-bit, with RStudio v1.0.143 on an i7-4790 with 16GB of RAM. Thus, I did not expect this operation to take that long.

Am I missing something? I started looking into the bigmemory and bigalgebra packages, but I can't help but think that there is a solution without resorting to packages.

EDIT

To give you an idea of the time difference, here's a script for Octave:

n = 331;
m = 23152;

mat_1 = rand(n,m);
mat_2 = rand(m,m);
tic
mat_3 = mat_1*mat_2;
toc

Output signal

Elapsed time is 3.81038 seconds.

And in Python:

import numpy as np
import time

n = 331
m = 23152

mat_1 = np.random.random((n,m))
mat_2 = np.random.random((m,m))
tm_1 = time.time()
mat_3 = np.dot(mat_1,mat_2)
tm_2 = time.time()
tm_3 = tm_2 - tm_1
print(tm_3)

Output signal

2.781277894973755

As you can see, these numbers are not in the same ball.

EDIT 2

In Zheyuan Li's request, here are examples of point product toys.

In R:

mat_1 = matrix(c(1,2,1,2,1,2), nrow = 2, ncol = 3)
mat_2 = matrix(c(1,1,1,2,2,2,3,3,3), nrow = 3, ncol = 3)
mat_3 = mat_1 %*% mat_2
print(mat_3)

Output:

     [,1] [,2] [,3]
[1,]    3    6    9
[2,]    6   12   18

In an octave:

mat_1 = [1,1,1;2,2,2];
mat_2 = [1,2,3;1,2,3;1,2,3];
mat_3 = mat_1*mat_2

Output:

mat_3 =

    3    6    9
    6   12   18

In Python:

import numpy as np

mat_1 = np.array([[1,1,1],[2,2,2]])
mat_2 = np.array([[1,2,3],[1,2,3],[1,2,3]])
mat_3 = np.dot(mat_1, mat_2)
print(mat_3)

Output:

[[ 3  6  9]
 [ 6 12 18]]

For more information on matrix dot products: https://en.wikipedia.org/wiki/Matrix_multiplication

EDIT 3

Output for sessionInfo()

:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0

EDIT 4

I tried the package bigalgebra

but that didn't seem to speed up:

library('bigalgebra')

N <- 331
M <- 23152

mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_1 <- as.big.matrix(mat_1)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

Output:

   user  system elapsed 
 101.79    0.00  101.81

EDIT 5

James suggested changing my randomly generated matrix:

N <- 331
M <- 23152

mat_1 = matrix( runif(N*M), N, M)
mat_2 = matrix( runif(M*M), M, M)
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

Output:

   user  system elapsed 
 102.46    0.05  103.00

+3

performance matrix r matrix-multiplication dot-product

BdB 08 May '17 at 16:23

source to share

3 answers

Is this a trivial operation? Matrix multiplication is always an expensive operation in linear algebra calculations.

Actually, I think it's pretty fast. Matrix multiplication with this size has

2 * 23.152 * 23.152 * 0.331 = 354.8 GFLOP

For 100 seconds, your productivity is 3.5 GFLOPs. Note that on most machines performance is no more than 0.8 GLOPs - 2 GFLOPs unless you have an optimized BLAS library.

If you think implementation elsewhere is faster, check out the possibility of using optimized BLAS or parallel computing. R does this with standard BLAS and no parallelism.

Attention!

More BLAS tools are available from R-3.4.0.

First of all, sessionInfo()

now returns the full path of the linked BLAS library. Yes, this does not point to a symbolic link, but to a final shared object! Another answer here just shows it: it has OpenBLAS.

The sync result (in another answer) implies that the parallel computing (via multithreading in OpenBLAS) is in place. It's hard for me to tell you about the number of threads being used, but it looks like hyperthreading is enabled since the "system" slot is quite large!

Second, options

it can now set matrix multiplication methods using matprod

. While this was introduced to work with NA / NaN, it also offers performance testing.

"internal" is an implementation in an unoptimized ternary loop socket. This is written in C and has equal performance to the standard (reference) BLAS written in F77;
"default", "blas" and "default.simd" mean to use the associated BLAS for computation, but the way NA and NaN are checked is different. If R is associated with standard BLAS, then as said it has the same performance as "internal"; but otherwise we see significant growth. Also note that the R command says that "default.simd" may be removed in the future.

+6

李哲源 08 May '17 at 17:01

source to share

I have a similar machine: Linux PC, 16GB RAM, Intel 4770K,

Relevant conclusion from sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.15.1   clipr_0.3.2    tibble_1.3.0   colorout_1.1-2

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0    Rcpp_0.12.10

On my machine, the code snippet takes ~ 5 seconds (RStudio started, empty .R file created, snippet running, output):

   user  system elapsed 
 27.608   5.524   4.920

Excerpt:

N <- 331
M <- 23152

mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
        mat_3 = mat_1 %*% mat_2
})
print(tm3)

+1

knb May 08 '17 at 21:11

source to share

BdB · Accepted Answer · 2017-05-09T09:50:41+0000

Based on the answers from knb and Zheyuan Li, I started researching optimized BLAS packages. I came across GotoBlas, OpenBLAS and MKL for example. here .

My conclusion is that MKL is far superior to the default BLAS.

It seems that R needs to be built from source in order to enable MKL. I found R Open instead . It has MKL (optional) built in, so installation is a breeze.

With the following code:

N <- 331
M <- 23152

mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
    mat_3 = mat_1%*%mat_2
})
print(tm3)

Output:

   user  system elapsed 
  10.61    0.10    3.12

Thus, one solution to this problem is to use MKL instead of the default BLAS.

However, when investigating, my real life matrices are very scarce. I was able to take advantage of this fact using the package Matrix

. In practice I have used it, for example Matrix(x = mat_1, sparse = TRUE)

, where mat_1

would be a highly sparse matrix. This reduced the execution time to 3 seconds.

Slow point product in R

More articles: