Using Scipy interface cython_blas from Cython not working on Mx1 1xN vectors

Question

Using Scipy interface cython_blas from Cython not working on Mx1 1xN vectors

There is a similar problem to deal with here: Calling BLAS / LAPACK directly using SciPy and Cython interface , but different in that I am using actual code in SciPy example here _test_dgemm

: https://github.com/scipy/scipy/blob/master/ scipy / linalg / cython_blas.pyx , which is very fast ( 5x faster than numpy.dot

inputting the output matrix, or 20x faster if not). It has no effect if Mx1 1xN vectors are transmitted. It gives the same values as numpy.dot

with the passed matrices. I have minimized the code as no answers have been posted for clarity. Here dgemm.pyx.

:

import numpy as np
cimport numpy as np
from scipy.linalg.cython_blas cimport dgemm
from cython cimport boundscheck

@boundscheck(False)
cpdef int fast_dgemm(double[:,::1] a, double[:,::1] b, double[:,::1] c, double alpha=1.0, double beta=0.0) nogil except -1:

    cdef:
        char *transa = 'n'
        char *transb = 'n'
        int m, n, k, lda, ldb, ldc
        double *a0=&a[0,0]
        double *b0=&b[0,0]
        double *c0=&c[0,0]

    ldb = (&a[1,0]) - a0 if a.shape[0] > 1 else 1
    lda = (&b[1,0]) - b0 if b.shape[0] > 1 else 1

    k = b.shape[0]
    if k != a.shape[1]:
        with gil:
            raise ValueError("Shape mismatch in input arrays.")
    m = b.shape[1]
    n = a.shape[0]
    if n != c.shape[0] or m != c.shape[1]:
        with gil:
            raise ValueError("Output array does not have the correct shape.")
    ldc = (&c[1,0]) - c0 if c.shape[0] > 1 else 1
    dgemm(transa, transb, &m, &n, &k, &alpha, b0, &lda, a0,
               &ldb, &beta, c0, &ldc)
    return 0

Here's a sample test script:

import numpy as np;

a=np.random.randn(1000);
b=np.random.randn(1000);
a.resize(len(a),1);
a=np.array(a, order='c');
b.resize(1,len(b)); 
b=np.array(b, order='c');
c = np.empty((a.shape[0],b.shape[1]), float, order='c'); 

from dgemm import _test_dgemm; 
_test_dgemm(a,b,c);

And if you want to play with it on Windows with Python 3.5 x64 here setup.py

to create it using the command line by typingpython setup.py build_ext --inplace --compiler=msvc

from Cython.Distutils import build_ext
import numpy as np
import os

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension

module = 'dgemm'

ext_modules = [Extension(module, sources=[module + '.pyx'],
              include_dirs=['C://Program Files (x86)//Windows Kits//10//Include//10.0.10240.0//ucrt','C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//include','C://Program Files (x86)//Windows Kits//8.1//Include//shared'],
              library_dirs=['C://Program Files (x86)//Windows Kits//8.1//bin//x64', 'C://Windows//System32', 'C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//lib//amd64', 'C://Program Files (x86)//Windows Kits//8.1//Lib//winv6.3//um//x64', 'C://Program Files (x86)//Windows Kits//10//Lib//10.0.10240.0//ucrt//x64'],
              extra_compile_args=['/Ot', '/favor:INTEL64', '/EHsc', '/GA'],
              language='c++')]

setup(
    name = module,
    ext_modules = ext_modules,
    cmdclass = {'build_ext': build_ext},
    include_dirs = [np.get_include(), os.path.join(np.get_include(), 'numpy')]
    )

Any help is greatly appreciated!

+1

scipy blas cython

Matt 07 jul. 17 at 22:43

source to share

1 answer

ead · Accepted Answer · 2017-07-20T21:14:20+0000

If I see it correctly, you are trying to use fortran routines for arrays with c-memory-layout.

Even if you are obviously aware of this, I would like to first talk about line order (c-memory-layout) and top column order (fortran-memory-layout) so that you can print my answer.

So, if we have a matrix 2x3

(i.e., 2 rows and 3 columns) A

and store it in some contiguous memory, we get:

row-major-order(A) = A11, A12, A13, A21, A22, A23
col-major-order(A) = A11, A21, A12, A22, A13, A33

This means that if we get a contiguous memory representing a matrix in row order and interpret it as a matrix in top column order, we end up with a completely different matrix!

However, we'll look at the transposed matrix A^t

, which is easy to see:

row-major-order(A) = col-major-order(A^t)
col-major-order(A) = row-major-order(A^t)

This means that if we want to get the matrix C

in the major-order row as a result, then the blas procedure must write the transposed matrix C

in the major-order column (after that we cannot change) into this very memory. However, C^t=(AB)^t=B^t*A^t

and B^t

an A^t

are the original matrices interpreted in the main column.

Now let A

be a n x k

-matrix and B

a k x m

-matrix, the call to dgemm should be as follows:

dgemm(transa, transb, &m, &n, &k, &alpha, b0, &m, a0, &k, &beta, c0, &m)

As you can see, you switched some n

and m

in your code.

Using Scipy interface cython_blas from Cython not working on Mx1 1xN vectors

More articles: