Slow point product in R
I am trying to take a dot product from a 331x23152 and 23152x23152 matrix.
In Python and Octave this is a trivial operation, but in R it seems incredibly slow.
N <- 331
M <- 23152
mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
mat_3 = mat_1%*%mat_2
})
print(tm3)
Output signal
user system elapsed
101.95 0.04 101.99
In other words, this dot product takes over 100 seconds.
I am running R-3.4.0 64-bit, with RStudio v1.0.143 on an i7-4790 with 16GB of RAM. Thus, I did not expect this operation to take that long.
Am I missing something? I started looking into the bigmemory and bigalgebra packages, but I can't help but think that there is a solution without resorting to packages.
EDIT
To give you an idea of ββthe time difference, here's a script for Octave:
n = 331; m = 23152; mat_1 = rand(n,m); mat_2 = rand(m,m); tic mat_3 = mat_1*mat_2; toc
Output signal
Elapsed time is 3.81038 seconds.
And in Python:
import numpy as np import time n = 331 m = 23152 mat_1 = np.random.random((n,m)) mat_2 = np.random.random((m,m)) tm_1 = time.time() mat_3 = np.dot(mat_1,mat_2) tm_2 = time.time() tm_3 = tm_2 - tm_1 print(tm_3)
Output signal
2.781277894973755
As you can see, these numbers are not in the same ball.
EDIT 2
In Zheyuan Li's request, here are examples of point product toys.
In R:
mat_1 = matrix(c(1,2,1,2,1,2), nrow = 2, ncol = 3) mat_2 = matrix(c(1,1,1,2,2,2,3,3,3), nrow = 3, ncol = 3) mat_3 = mat_1 %*% mat_2 print(mat_3)
Output:
[,1] [,2] [,3]
[1,] 3 6 9
[2,] 6 12 18
In an octave:
mat_1 = [1,1,1;2,2,2];
mat_2 = [1,2,3;1,2,3;1,2,3];
mat_3 = mat_1*mat_2
Output:
mat_3 =
3 6 9
6 12 18
In Python:
import numpy as np mat_1 = np.array([[1,1,1],[2,2,2]]) mat_2 = np.array([[1,2,3],[1,2,3],[1,2,3]]) mat_3 = np.dot(mat_1, mat_2) print(mat_3)
Output:
[[ 3 6 9]
[ 6 12 18]]
For more information on matrix dot products: https://en.wikipedia.org/wiki/Matrix_multiplication
EDIT 3
Output for sessionInfo()
:
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0
EDIT 4
I tried the package bigalgebra
but that didn't seem to speed up:
library('bigalgebra')
N <- 331
M <- 23152
mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_1 <- as.big.matrix(mat_1)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
mat_3 = mat_1%*%mat_2
})
print(tm3)
Output:
user system elapsed
101.79 0.00 101.81
EDIT 5
James suggested changing my randomly generated matrix:N <- 331
M <- 23152
mat_1 = matrix( runif(N*M), N, M)
mat_2 = matrix( runif(M*M), M, M)
tm3 <- system.time({
mat_3 = mat_1%*%mat_2
})
print(tm3)
Output:
user system elapsed
102.46 0.05 103.00
source to share
Based on the answers from knb and Zheyuan Li, I started researching optimized BLAS packages. I came across GotoBlas, OpenBLAS and MKL for example. here .
My conclusion is that MKL is far superior to the default BLAS.
It seems that R needs to be built from source in order to enable MKL. I found R Open instead . It has MKL (optional) built in, so installation is a breeze.
With the following code:
N <- 331
M <- 23152
mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
mat_3 = mat_1%*%mat_2
})
print(tm3)
Output:
user system elapsed
10.61 0.10 3.12
Thus, one solution to this problem is to use MKL instead of the default BLAS.
However, when investigating, my real life matrices are very scarce. I was able to take advantage of this fact using the package Matrix
. In practice I have used it, for example Matrix(x = mat_1, sparse = TRUE)
, where mat_1
would be a highly sparse matrix. This reduced the execution time to 3 seconds.
source to share
Is this a trivial operation? Matrix multiplication is always an expensive operation in linear algebra calculations.
Actually, I think it's pretty fast. Matrix multiplication with this size has
2 * 23.152 * 23.152 * 0.331 = 354.8 GFLOP
For 100 seconds, your productivity is 3.5 GFLOPs. Note that on most machines performance is no more than 0.8 GLOPs - 2 GFLOPs unless you have an optimized BLAS library.
If you think implementation elsewhere is faster, check out the possibility of using optimized BLAS or parallel computing. R does this with standard BLAS and no parallelism.
Attention!
More BLAS tools are available from R-3.4.0.
First of all, sessionInfo()
now returns the full path of the linked BLAS library. Yes, this does not point to a symbolic link, but to a final shared object! Another answer here just shows it: it has OpenBLAS.
The sync result (in another answer) implies that the parallel computing (via multithreading in OpenBLAS) is in place. It's hard for me to tell you about the number of threads being used, but it looks like hyperthreading is enabled since the "system" slot is quite large!
Second, options
it can now set matrix multiplication methods using matprod
. While this was introduced to work with NA / NaN, it also offers performance testing.
- "internal" is an implementation in an unoptimized ternary loop socket. This is written in C and has equal performance to the standard (reference) BLAS written in F77;
- "default", "blas" and "default.simd" mean to use the associated BLAS for computation, but the way NA and NaN are checked is different. If R is associated with standard BLAS, then as said it has the same performance as "internal"; but otherwise we see significant growth. Also note that the R command says that "default.simd" may be removed in the future.
source to share
I have a similar machine: Linux PC, 16GB RAM, Intel 4770K,
Relevant conclusion from sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.15.1 clipr_0.3.2 tibble_1.3.0 colorout_1.1-2
loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0 Rcpp_0.12.10
On my machine, the code snippet takes ~ 5 seconds (RStudio started, empty .R file created, snippet running, output):
user system elapsed
27.608 5.524 4.920
Excerpt:
N <- 331
M <- 23152
mat_1 = matrix( rnorm(N*M,mean=0,sd=1), N, M)
mat_2 = matrix( rnorm(N*M,mean=0,sd=1), M, M)
tm3 <- system.time({
mat_3 = mat_1 %*% mat_2
})
print(tm3)
source to share