Using Scipy interface cython_blas from Cython not working on Mx1 1xN vectors

There is a similar problem to deal with here: Calling BLAS / LAPACK directly using SciPy and Cython interface , but different in that I am using actual code in SciPy example here _test_dgemm

: https://github.com/scipy/scipy/blob/master/ scipy / linalg / cython_blas.pyx , which is very fast ( 5x faster than numpy.dot

inputting the output matrix, or 20x faster if not). It has no effect if Mx1 1xN vectors are transmitted. It gives the same values ​​as numpy.dot

with the passed matrices. I have minimized the code as no answers have been posted for clarity. Here dgemm.pyx.

:

import numpy as np
cimport numpy as np
from scipy.linalg.cython_blas cimport dgemm
from cython cimport boundscheck

@boundscheck(False)
cpdef int fast_dgemm(double[:,::1] a, double[:,::1] b, double[:,::1] c, double alpha=1.0, double beta=0.0) nogil except -1:

    cdef:
        char *transa = 'n'
        char *transb = 'n'
        int m, n, k, lda, ldb, ldc
        double *a0=&a[0,0]
        double *b0=&b[0,0]
        double *c0=&c[0,0]

    ldb = (&a[1,0]) - a0 if a.shape[0] > 1 else 1
    lda = (&b[1,0]) - b0 if b.shape[0] > 1 else 1

    k = b.shape[0]
    if k != a.shape[1]:
        with gil:
            raise ValueError("Shape mismatch in input arrays.")
    m = b.shape[1]
    n = a.shape[0]
    if n != c.shape[0] or m != c.shape[1]:
        with gil:
            raise ValueError("Output array does not have the correct shape.")
    ldc = (&c[1,0]) - c0 if c.shape[0] > 1 else 1
    dgemm(transa, transb, &m, &n, &k, &alpha, b0, &lda, a0,
               &ldb, &beta, c0, &ldc)
    return 0

      

Here's a sample test script:

import numpy as np;

a=np.random.randn(1000);
b=np.random.randn(1000);
a.resize(len(a),1);
a=np.array(a, order='c');
b.resize(1,len(b)); 
b=np.array(b, order='c');
c = np.empty((a.shape[0],b.shape[1]), float, order='c'); 

from dgemm import _test_dgemm; 
_test_dgemm(a,b,c);

      

And if you want to play with it on Windows with Python 3.5 x64 here setup.py

to create it using the command line by typingpython setup.py build_ext --inplace --compiler=msvc

from Cython.Distutils import build_ext
import numpy as np
import os

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension

module = 'dgemm'

ext_modules = [Extension(module, sources=[module + '.pyx'],
              include_dirs=['C://Program Files (x86)//Windows Kits//10//Include//10.0.10240.0//ucrt','C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//include','C://Program Files (x86)//Windows Kits//8.1//Include//shared'],
              library_dirs=['C://Program Files (x86)//Windows Kits//8.1//bin//x64', 'C://Windows//System32', 'C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//lib//amd64', 'C://Program Files (x86)//Windows Kits//8.1//Lib//winv6.3//um//x64', 'C://Program Files (x86)//Windows Kits//10//Lib//10.0.10240.0//ucrt//x64'],
              extra_compile_args=['/Ot', '/favor:INTEL64', '/EHsc', '/GA'],
              language='c++')]

setup(
    name = module,
    ext_modules = ext_modules,
    cmdclass = {'build_ext': build_ext},
    include_dirs = [np.get_include(), os.path.join(np.get_include(), 'numpy')]
    )

      

Any help is greatly appreciated!

+1


source to share


1 answer


If I see it correctly, you are trying to use fortran routines for arrays with c-memory-layout.

Even if you are obviously aware of this, I would like to first talk about line order (c-memory-layout) and top column order (fortran-memory-layout) so that you can print my answer.

So, if we have a matrix 2x3

(i.e., 2 rows and 3 columns) A

and store it in some contiguous memory, we get:

row-major-order(A) = A11, A12, A13, A21, A22, A23
col-major-order(A) = A11, A21, A12, A22, A13, A33

      

This means that if we get a contiguous memory representing a matrix in row order and interpret it as a matrix in top column order, we end up with a completely different matrix!

However, we'll look at the transposed matrix A^t

, which is easy to see:



row-major-order(A) = col-major-order(A^t)
col-major-order(A) = row-major-order(A^t)

      

This means that if we want to get the matrix C

in the major-order row as a result, then the blas procedure must write the transposed matrix C

in the major-order column (after that we cannot change) into this very memory. However, C^t=(AB)^t=B^t*A^t

and B^t

an A^t

are the original matrices interpreted in the main column.

Now let A

be a n x k

-matrix and B

a k x m

-matrix, the call to dgemm should be as follows:

dgemm(transa, transb, &m, &n, &k, &alpha, b0, &m, a0, &k, &beta, c0, &m)

      

As you can see, you switched some n

and m

in your code.

+1


source







All Articles