Convert Matrix Multiplication to cuBLAS
The task is simple: I have two matrices A and B, which are M over N, where M → N. I want to first move the transposition of A and then multiply that by B (A ^ T * B) to put this in C, which is N to N. I have everything configured for A and B, but how can I call cublasSgemm correctly without returning the wrong answer?
I understand that cuBlas has a cublasOperation_t list to pre-pass things, but somehow I'm not using it quite correctly. My matrices A and B are in row order ie [row1] [row2] [row3] ..... in device memory. This means that in order for A to be interpreted as A-transposed, BLAS needs to know that my A is in the main column. My current code looks like this:
float *A, *B, *C;
// initialize A, B, C as device arrays, fill them with values
// initialize m = num_row_A, n = num_row_B, and k = num_col_A;
// set lda = m, ldb = k, ldc = m;
// alpha = 1, beta = 0;
// set up cuBlas handle ...
cublasSgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, m, n, k, &alpha, A, lda, B, ldb, &beta, C, ldc);
My questions:
Have I configured m, k, n correctly?
How about lda, ldb, ldc?
Thank!
source to share
Since cuBLAS always assume matrices are stored in the major column. You can migrate your matrices to colm-major first with cublas_geam (), or
You can treat your A matrix stored in a row as a new AT matrix stored in a column. The AT matrix is actually a transposition of A. For B, do the same. Then you can compute the matrix C stored in column-to-column byC=AT * BT^T
float* AT = A;
float* BT = B;
The leading dimension is a storage-related parameter that does not change regardless of whether you use the transpose flag CUBLAS_OP_T
or not.
lda = num_col_A = num_row_AT = N;
ldb = num_col_B = num_row_BT = N;
ldc = num_row_C = N;
m
and n
in the cuBLAS GEMM routine: #rows and #cols of the C result matrix,
m = num_row_C = num_row_AT = num_col_A = N;
n = num_col_C = num_row_BT = num_col_B = N;
k
- general dimension A ^ T and B,
k = num_col_AT = num_row_B = M;
Then you can call the GEMM program with
cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_T, m, n, k, &alpha, AT, lda, BT, ldb, &beta, C, ldc);
If you want the matrix C to be stored in a row, you could calculate the CT stored in the major column using the formula CT = BT * AT^T
on
cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_T, n, m, k, &alpha, BT, ldb, AT, lda, &beta, CT, ldc);
Note that you do not need to swap m
and n
, since C is a square matrix in this case.
source to share