Using Scipy interface cython_blas from Cython not working on Mx1 1xN vectors
There is a similar problem to deal with here: Calling BLAS / LAPACK directly using SciPy and Cython interface , but different in that I am using actual code in SciPy example here _test_dgemm
: https://github.com/scipy/scipy/blob/master/ scipy / linalg / cython_blas.pyx , which is very fast ( 5x faster than numpy.dot
inputting the output matrix, or 20x faster if not). It has no effect if Mx1 1xN vectors are transmitted. It gives the same values as numpy.dot
with the passed matrices. I have minimized the code as no answers have been posted for clarity. Here dgemm.pyx.
:
import numpy as np
cimport numpy as np
from scipy.linalg.cython_blas cimport dgemm
from cython cimport boundscheck
@boundscheck(False)
cpdef int fast_dgemm(double[:,::1] a, double[:,::1] b, double[:,::1] c, double alpha=1.0, double beta=0.0) nogil except -1:
cdef:
char *transa = 'n'
char *transb = 'n'
int m, n, k, lda, ldb, ldc
double *a0=&a[0,0]
double *b0=&b[0,0]
double *c0=&c[0,0]
ldb = (&a[1,0]) - a0 if a.shape[0] > 1 else 1
lda = (&b[1,0]) - b0 if b.shape[0] > 1 else 1
k = b.shape[0]
if k != a.shape[1]:
with gil:
raise ValueError("Shape mismatch in input arrays.")
m = b.shape[1]
n = a.shape[0]
if n != c.shape[0] or m != c.shape[1]:
with gil:
raise ValueError("Output array does not have the correct shape.")
ldc = (&c[1,0]) - c0 if c.shape[0] > 1 else 1
dgemm(transa, transb, &m, &n, &k, &alpha, b0, &lda, a0,
&ldb, &beta, c0, &ldc)
return 0
Here's a sample test script:
import numpy as np; a=np.random.randn(1000); b=np.random.randn(1000); a.resize(len(a),1); a=np.array(a, order='c'); b.resize(1,len(b)); b=np.array(b, order='c'); c = np.empty((a.shape[0],b.shape[1]), float, order='c'); from dgemm import _test_dgemm; _test_dgemm(a,b,c);
And if you want to play with it on Windows with Python 3.5 x64 here setup.py
to create it using the command line by typingpython setup.py build_ext --inplace --compiler=msvc
from Cython.Distutils import build_ext
import numpy as np
import os
try:
from setuptools import setup
from setuptools import Extension
except ImportError:
from distutils.core import setup
from distutils.extension import Extension
module = 'dgemm'
ext_modules = [Extension(module, sources=[module + '.pyx'],
include_dirs=['C://Program Files (x86)//Windows Kits//10//Include//10.0.10240.0//ucrt','C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//include','C://Program Files (x86)//Windows Kits//8.1//Include//shared'],
library_dirs=['C://Program Files (x86)//Windows Kits//8.1//bin//x64', 'C://Windows//System32', 'C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//lib//amd64', 'C://Program Files (x86)//Windows Kits//8.1//Lib//winv6.3//um//x64', 'C://Program Files (x86)//Windows Kits//10//Lib//10.0.10240.0//ucrt//x64'],
extra_compile_args=['/Ot', '/favor:INTEL64', '/EHsc', '/GA'],
language='c++')]
setup(
name = module,
ext_modules = ext_modules,
cmdclass = {'build_ext': build_ext},
include_dirs = [np.get_include(), os.path.join(np.get_include(), 'numpy')]
)
Any help is greatly appreciated!
source to share
If I see it correctly, you are trying to use fortran routines for arrays with c-memory-layout.
Even if you are obviously aware of this, I would like to first talk about line order (c-memory-layout) and top column order (fortran-memory-layout) so that you can print my answer.
So, if we have a matrix 2x3
(i.e., 2 rows and 3 columns) A
and store it in some contiguous memory, we get:
row-major-order(A) = A11, A12, A13, A21, A22, A23 col-major-order(A) = A11, A21, A12, A22, A13, A33
This means that if we get a contiguous memory representing a matrix in row order and interpret it as a matrix in top column order, we end up with a completely different matrix!
However, we'll look at the transposed matrix A^t
, which is easy to see:
row-major-order(A) = col-major-order(A^t) col-major-order(A) = row-major-order(A^t)
This means that if we want to get the matrix C
in the major-order row as a result, then the blas procedure must write the transposed matrix C
in the major-order column (after that we cannot change) into this very memory. However, C^t=(AB)^t=B^t*A^t
and B^t
an A^t
are the original matrices interpreted in the main column.
Now let A
be a n x k
-matrix and B
a k x m
-matrix, the call to dgemm should be as follows:
dgemm(transa, transb, &m, &n, &k, &alpha, b0, &m, a0, &k, &beta, c0, &m)
As you can see, you switched some n
and m
in your code.
source to share